Contemporary Data Analysis: Survey and Best Practices

About the Course

This course is designed to fill this gap. It is a survey course on state-of-the-art in interdisciplinary methods of data analysis, applicable to business and academia alike. Unlike other statistical courses, which focus on specific methods, this course will focus on the broader areas within statistics and data analytics. There are five major topics it will cover. It will start with the root of it all - the data – and some of the problems with the data. Then it will move through the contemporary approaches to descriptive, inferential, predictive and prescriptive analytics

Course Objectives

Understand the theoretical foundation behind the methods without focusing too much on the mathematics

Learn the applied, problem-based approach to using specific tools

Get a good understanding of the state-of-the-art tools that the field of data analysis currently has to offer

Learning Outcomes

1. Know the basic types of data and data classification

2. Understand the issues that arise when working with real-life data

3. Know the basic numeric measures and approaches to selecting best measures

4. Understand how descriptive analytics is used to generate business analytics cases

Course Syllabus

Week 1. Introduction and the data

The first lecture is designed to provide the broad overview of the data analysis field and the two major components it consists of: the data and the analysis. Topics within the first lecture explain how these two concepts fit together. We start with the definitions to clear some of the confusions with terminology in the field. Then, we discuss the contents of this course and map the field of data analysis. We also discuss the role that data play in our lives, what the data are, their types and classifications, and sources of data. We finally address the issue of modeling – why we model and how analytics aids decision-making in business and real life.

Week 2. Data issues that go bump in the night

This lecture is on topic that is rarely covered in detail in most data analytics programs: the problems that we face when working with data. The segments within the lecture each cover different aspects of the data issues that can arise when working with real-life data. They include concerns with data – data management, including cleaning and recoding; sources of data errors and their fixing; working with different data file structures. We also discuss detecting fake data and state-of-the art missing data analysis.

Week 3. Descriptive Analytics

This lecture covers the first steps to analysis that should be done with data that has been collected, cleaned, checked for issues and missing data, and otherwise prepared for the analysis. These first steps, aimed at understanding “what happened” or gathering information, are collectively called “descriptive analytics.” We start with definitions of population and sample, and move to basic graphical descriptions. We then discuss various numerical measures and selecting the best measure for a given dataset. Next, we talk about advanced graphs and charts and how to make descriptions meaningful. Finally, we examine everything we’ve learned on real cases of the Coca-Cola Company and McDonald’s corporation.

Week 4. Inferential analytics

Linear regression, the most widely used analytical method, belongs, for the most part, to the domain of inferential analytics. Inferential analytics is concerned with explaining “why did something happen” – in other words, making inferences from the data. At the heart of this approach is hypothesis testing, which we discuss first. Then, we move to variables used to make inferences and their relationships and discuss the data requirements for inferential analytics. We discuss the basics of regression analysis and look at different examples of inferential analytics models.

Week 5. Predictive Analytics

Predictive analytics is impossible without establishing causal relationships first. Therefore, we first discuss the issue of causality, approaches to studying this phenomenon, and causality in observational studies. Then, we move a step up: from causality, where we establish the influence of one variable on the other, to prediction, or future relationships between these variables. We talk about predictive modeling of continuous and discrete outcomes, and discuss some of the modeling issues that may arise with predictions.

Week 6. Prescriptive Analytics

Prescriptive analytics is concerned with optimization or making the most desirable outcome happen. In this lecture, we look at theoretical considerations of prescriptive analytics, then talk about optimization as an approach, and discuss stochastic vs. mathematical optimization. Then, we discuss the specifics of one very common optimization method – linear programming, including problem setup and the simplex method approach to solving linear programming problems.

Week 7. Final assignment

Final assignment.