Applications are open until June 17, 2026. Interested candidates can apply through the university’s official website, where the full list of courses is available.
The courses are Visualization, Inference and Modeling, Causal Diagrams: Define Your Hypotheses Before Drawing Conclusions, Capstone, Digital Humanities – From Research to Results, Probability and Linear Regression.
1. Visualization: The course introduces the fundamentals of data visualization and exploratory data analysis using the ggplot2 package in the statistical programming language R. It progresses from simple datasets to real-world case studies in global health, economics and U.S. infectious disease trends, while highlighting how errors, bias and data quality issues can affect analysis.
Emphasizing the growing importance of visualization in communicating insights and identifying flaws, the course aims to equip learners with practical skills to analyze data effectively and draw meaningful conclusions.
2. Inference and Modeling: The course teaches core principles of statistical inference and modeling through a practical case study on election forecasting. Using the R programming language, learners explore how polls are analyzed to produce estimates, margins of error and predictions, along with measures of forecast precision. The course covers key data science concepts such as confidence intervals, p-values and Bayesian modeling, culminating in the construction of a simplified election forecast model applied to the 2016 U.S. election.
3. Causal Diagrams: Draw Your Assumptions Before Your Conclusions: The course explains how causal diagrams are used to assess whether one factor has a causal effect on another, and why they have become an essential tool in modern research. It introduces the theory behind causal diagrams and their role in clarifying assumptions, identifying biases and selecting appropriate adjustment variables.
Through a series of lessons and real-world case studies from the health and social sciences, the course demonstrates how causal diagrams can be applied in practice, including to complex situations involving time-varying confounders.
4. Capstone: This capstone project gives learners the opportunity to apply the R-based data analysis skills acquired throughout the program, including data visualization, probability, statistical inference and modeling, data wrangling and organization, regression analysis and machine learning.
Unlike earlier courses in the Professional Certificate in Data Science, this project offers minimal instructor guidance, encouraging independent problem-solving. Upon completion, participants will have a polished data product to present to prospective employers or academic programs, demonstrating their proficiency and readiness in the field of data science.
5. Digital Humanities – From Research to Results: In this course, learners build key components of a search engine designed specifically for academic research, while gaining a foundation in text analysis techniques central to the digital humanities. The curriculum focuses on methods for analyzing and manipulating written language and demonstrates how these tools support scholarly inquiry.
Through the analysis of 18th-century literature, the course illustrates how text analysis can be used across philosophical, religious, political and historical materials, encouraging students to combine traditional research approaches with data science to uncover new and unexpected insights.
6. Probability: The course introduces key statistical concepts including random variables, independence, Monte Carlo simulation, expected values, standard errors and the Central Limit Theorem. These ideas underpin statistical testing and help determine whether observed results are driven by experimental design or random variation. As the mathematical basis of statistical inference, probability theory is critical for analyzing data influenced by chance and is therefore a core area of knowledge for data scientists.
7. Linear regression: This course, part of the Professional Certificate Program in Data Science, focuses on applying linear regression in practice and addressing confounding using R.
Using a case study inspired by the data-driven team-building approach popularized in Moneyball, the course explores how linear regression can be used to identify which performance metrics best predict baseball runs. It also examines how confounding variables can create misleading associations and explains how regression can help account for them. Emphasis is placed on understanding the limits of the method and recognizing when linear regression is, and is not, appropriate to use.