**Director:**

Kosuke Imai

**Executive Committee:**

David Blei, Jianqing Fan, Kosuke Imai, Robert Schapire, John Storey

**Overview**

The Program in Statistics and Machine Learning is offered by the Center for Statistics and Machine Learning. The program is designed for students, majoring in any department, who have a strong interest in data analysis and its application across disciplines. Statistics and machine learning, the academic disciplines centered around developing and understanding data analysis tools, play an essential role in various scientific fields including biology, engineering, and the social sciences. This new field of “data science” is interdisciplinary, merging contributions from computer science and statistics, and addressing numerous applied problems. Examples of data analysis problems include analyzing massive quantities of text and images, modeling cell-biological processes, pricing financial assets, evaluating the efficacy of public policy programs, and forecasting election outcomes. In addition to its importance in scientific research and policy making, the study of data analysis comes with its own theoretical challenges, such as the development of methods and algorithms for making reliable inferences from high-dimensional and heterogeneous data. This program provides students with a set of tools required for addressing these emerging challenges. Through the program, students will learn basic theoretical frameworks and apply statistics and machine learning methods to many problems of interest.

**Enrollment to the Program**

Students are admitted to the program after they have chosen a concentration, generally by the beginning of their junior year. At that time, students must have prepared a tentative plan and timeline for completing all of the requirements of the program, including required courses and independent work (as outlined below), as well as any prerequisites for the selected courses. For enrollment or questions contact Tara Zigler, program manager, tzigler@princeton.edu

**Program of Study**

Students are required to take a total of five courses and earn at least B- for each course: one of the “Foundations of Statistics” courses, one of the “Foundations of Machine Learning” courses, and three elective courses. With all necessary permissions, advanced students may also take approved graduate-level courses. Students may count at most two courses from another degree program (departmental major or another certificate program) towards this certificate program.

Students are also required to complete a thesis or at least one semester of independent work in their junior or senior year on a topic that makes substantial application or study of machine learning or statistics. This work may be used to satisfy the requirements of both the program and the student's department of concentration. Submission is due on the same date as your department deadline for thesis or junior independent work. All work will be reviewed by the Statistics and Machine Learning Certificate committee. At the end of each year, there will be a public poster session for students to present their work to each other, to other students, and to the faculty.

Finally, students are encouraged to attend one of the Statistics and Machine Learning colloquia on campus. These include the Wilks Statistics Seminar, the Machine Learning Seminar, the Political Methodology Seminar, or the Quantitative and Computational Biology Seminar.

**Certificate of Proficiency**

Students who fulfill the program requirements receive a certificate upon graduation.

**Courses**

- ECO 202 Statistics & Data Analysis for Economics
- EEB 355 Introduction to Statistics for Biology (also MOL 355)
- ORF 245 Fundamentals of Engineering Statistics
- POL 345 Quantitative Analysis and Politics
- PSY 251 Quantitative Methods
- WWS 200 Statistics for Social Science

- COS 424 Interacting with Data
- ORF 350 Analysis of Big Data

Machine Learning

- COS 402 Artificial Intelligence
- COS 401 Introduction to Machine Translation
- ELE 218 Learning Theory and Epistemology
- ORF 418 Optimal Learning

Theory

- MAT 385 Probability Theory
- ORF 309 Probability and Stochastic Systems
- ORF 473 Special Topics in Operations Research and Financial Engineering

Applied Statistics

- ECO 302 Econometrics
- ECO 312 Econometrics: A Mathematical Approach
- ECO 313 Econometric Applications
- ELE 480 FMRI Decoding: Reading Minds Using Brain Scans
- ELE 486 Compression and Transmission of Information
- GEO 422 Data, Models, and Uncertainty in the Natural Sciences
- MOL 436 Statistical Methods for Genomic Data
- ORF 405 Regression and Time Series
- POL 346 Applied Quantitative Analysis

**Example Paths for SML Certificate**

CS Student

- ORF 245 Fundamentals of Engineering Statistics
- COS 424 Interacting with Data

- ORF 309 Probability and Stochastic Systems
- ORF 350 Analysis of Big Data
- COS 402 Artificial Intelligence

ORFE Student

- ORF 245 Fundamentals of Engineering Statistics
- COS 424 Interacting with Data

- ECO 312 Econometrics: A Mathematical Approach
- ORF 350 Analysis of Big Data
- ELE 486 Compression and Transmission of Information

Life Scientist

- MOL 355 Introduction to Statistics for Biology
- COS 424 Interacting with Data

- ORF 309 Probability and Stochastic Systems
- GEO 422 Data, Models, and Uncertainty in the Natural Sciences
- MOL 436 Statistical Methods for Genomic Data

Social Scientist

- POL 345 Quantitative Analysis and Politics
- COS 424 Interacting with Data

- ECO 312 Econometrics: A Mathematical Approach
- ECO 313 Econometric Applications
- POL 346 Applied Quantitative Analysis

**Important Dates**

Poster session for the Class of 2014 will take place on Tuesday, May 13, 2014. Time and place to be determined.

**Frequently Asked Questions**

Q: I'm a rising junior interested in participating in the statistics and machine learning program. However, I'm a history major, and as such I am not sure how feasible it is to include data analysis into my independent work. Is it possible to work around this requirement or supplement it with something else given my circumstances?

A: No. Unfortunately, you have to use statistics and/or machine learning in your independent work. However, the breadth of these methods have expanded beyond the sciences. For example, the digital humanities makes heavy use of data analysis methods.

Q: To what extent would I need to use statistics and/or machine learning? Would I need to develop my own study/methodology?

A: Statistics and/or machine learning must play a major role in your independent work. You do not necessarily have to develop new methods but you must apply statistical and/or machine learning methods to your question of interest. The independent work is a culmination of your education at Princeton: you need to demonstrate that you can apply methods you've learned in order to answer research questions. We will also have a poster session where you will be presenting your work to SML faculty.

Q: Would taking a more advanced course than those listed in the "foundations of statistics" or "foundations of machine learning" category fulfill the requirement?

A: Yes, that is possible but students must obtain permission of the certificate director. In addition, students are required to take a total of 5 courses in order to be qualified for the SML certificate.

Q: If students are only allowed to take 2 courses from one concentration, then is the example path for CS student's incorrect?

A: So long as students are not using more than two courses towards another degree program (i.e., departmental major or another certificate program), they are allowed to count these courses towards the SML certificate.

Q: Is PDF acceptable for courses?

A: No. Students must take 5 courses and earn at least a grade of B- for each course.