Data Driven Decision Making (D3M)

“Every two days we now create as much information as we did from the dawn of civilization up until 2003” Eric Schmidt, 2009

“Data are widely available; what is scarce is the ability to extract wisdom from them” Hal Varian (UC Berkeley and Chief Economist, Google)

The two quotes above summarize the main theme of this course. In every aspect of our daily lives, from the way we work, shop, communicate, or socialize; we are both consuming and creating vast amounts of information. More often than not, these daily activities create a trail of digitized data that is being stored, mined, and analyzed by entities in the private (e.g. Google, Wal-Mart) as well as the public and non-profit sectors (e.g academia, government).

The general goals of these data driven initiatives is the hope of generating valuable intelligence that is pertinent to business decisions or public policies. For example, customer transaction databases provide vast amounts of high-quality data that can allow firms to understand customer behavior, and customize business tactics to increasingly fine segments or even segments of one. However, much of the promise of such data-driven policies has largely failed to materialize; primarily due to the difficulty of translating data into actionable strategies.

The objectives of this course are to fill this gap by training you with the tools and techniques needed to analyze large databases, expose you to a wide variety of issues in an empirical context, and instilling an intuition for D3M, i.e. how to generate insights from the volumes of data. Read More

1 Introduction to D3M

We will begin the course with a general introduction on what we mean by data driven decision making and why it is imperative to develop an intuition for data language in the modern world. These intoductory sessions provide several examples (including many from my own research) on the role of analytics in decision making and lay foundation for material to come. These include discussion of various software, intuition for “types” of data (stated versus revealed preference, cross-sectional, time series, panel data, structured vs. unstructred, level of aggregation, and so forth). These sessions are meant to lay foundation for material to come and provide a broad intuition for ‘data linguistics’.

2 Experiments & Causal Effects

Experimental designs are often regarded as the “gold standard” for making causal or cause-effect inferences. We will discuss the issues of design of experiments and internal and external validity. Several case studies in marketing, economics, and medicine that range from controlled lab and field experiments, A-B testing, and circumstances that provide us with “natural” experiments will be discussed using hands-on implementation.

3 Regression Analysis

In this topic we will turn our attention to the relationships among variables. Regression is by far the most useful tool for analyzing relationships between a phenomenon of interest (independent variable) and one or more predictor variables. We will spend a fair amount of time on regression and its applications. Emphasis will be on use of regression output in forecasting, elasticity analysis, and various applications such as promotional planning and optimal pricing.

4 Multivariate Analysis

Digital communication along with an emphasis by firms to assemble volumes of customer data have resulted in increasingly complex information sets available to managers. In these cases, techniques focused on data reduction can provide a clearer picture and facilitate decision making. This topic looks at two powerful techniques for data reduction:

PCA/Factor analysis is a “method used to describe variability among correlated variables in terms of a potentially lower number of latent factors. Factor analysis originated in psychometrics, and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data” (from Wikipedia)

Cluster Analysis: Hosts of algorithms that allow “grouping a set of objects in such a way that objects in the same group is more similar (in some metric) to each other than to those in other groups. It is used in many fields, including data mining, machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics” (from Wikipedia)

7 Visualization

In this final topic we will learn how to present results of our analysis in a visually appealing way. In particular, we learn how to develop effective dashboards in Tableau to communicate findings.

8 Appendix

Additional files

Technographic Survey, BAV, Cereal, Spotify, State Demographics

Vishal Singh