Welcome to the future web site of the **Center for Statistics and Machine Learning** at Princeton University. You can read a recent article about the establishment of the center here and about the statistics and machine learning task force that has been formed.

For further information, please contact the Director John Storey (contact information) or the Acting Assistant Director Kara Dolinski (dolinski@princeton.edu). Check back regularly for updates as the web site and center develop.

We have an established undergraduate certificate program with information below.

### Undergraduate Certificate Program in Statistics and Machine Learning

**Director of Undergraduate Certificate Program:**

Kosuke Imai

**Executive Committee:**

Jianqing Fan, Kosuke Imai, Robert Schapire, John D. Storey

**Associated Faculty:**

Yacine Ait-Sahalia, Sanjeev Arora, Sebastien Bubeck, Rene Carmona, Mung Chiang, Jonathan D. Cohen, Paul W. Cuff, David P. Dobkin, Barbara Engelhardt, Elad Hazan, Bo E. Honore, Michal Kolesar, Sanjeev R. Kulkarni, Han Liu, Ulrich K. Mueller, Jonathan Pillow, H. Vincent Poor, Peter J. Ramadge, Marc Ratkovic, Matthew J. Salganik, H. Sebastian Seung, Christopher A. Sims, Amit Singer, Mona Singh, Michael Strauss, Olga G. Troyanskaya, Ramon van Handel, Robert Vanderbei, Sergio Verdu, Mark W. Watson

**Sits with Committee:**

Andrew Conway, German Rodriguez

**Overview**

The Program in Statistics and Machine Learning is offered by the Center for Statistics and Machine Learning. The program is designed for students, majoring in any department, who have a strong interest in data analysis and its application across disciplines. Statistics and machine learning, the academic disciplines centered around developing and understanding data analysis tools, play an essential role in various scientific fields including biology, engineering, and the social sciences. This new field of “data science” is interdisciplinary, merging contributions from computer science and statistics, and addressing numerous applied problems. Examples of data analysis problems include analyzing massive quantities of text and images, modeling cell-biological processes, pricing financial assets, evaluating the efficacy of public policy programs, and forecasting election outcomes. In addition to its importance in scientific research and policy making, the study of data analysis comes with its own theoretical challenges, such as the development of methods and algorithms for making reliable inferences from high-dimensional and heterogeneous data. This program provides students with a set of tools required for addressing these emerging challenges. Through the program, students will learn basic theoretical frameworks and apply statistics and machine learning methods to many problems of interest.

**Enrollment to the Program**

Students are admitted to the program after they have chosen a concentration, generally by the beginning of their junior year. At that time, students must have prepared a tentative plan and timeline for completing all of the requirements of the program, including required courses and independent work (as outlined below), as well as any prerequisites for the selected courses. For enrollment or questions contact Esther Kim, program manager, estherk@princeton.edu.

**Program of Study**

Students are required to take a total of five courses and earn at least B- for each course: one of the “Foundations of Statistics” courses, one of the “Foundations of Machine Learning” courses, and three elective courses. With all necessary permissions, advanced students may also take approved graduate-level courses. Students may count at most two courses from another degree program (departmental major or another certificate program) towards this certificate program.

Students are also required to complete a thesis or at least one semester of independent work in their junior or senior year on a topic that makes substantial application or study of machine learning or statistics. This work may be used to satisfy the requirements of both the program and the student's department of concentration. Submission is due on the same date as your department deadline for thesis or junior independent work. All work will be reviewed by the Statistics and Machine Learning Certificate committee. At the end of each year, there will be a public poster session at which students are required to present their work to each other, to other students, and to the faculty.

Finally, students are encouraged to attend one of the Statistics and Machine Learning colloquia on campus. These include the Wilks Statistics Seminar, the Machine Learning Seminar, the Political Methodology Seminar, or the Quantitative and Computational Biology Seminar.

**Certificate of Proficiency**

Students who fulfill the program requirements receive a certificate upon graduation.

**Contact**

Please contact the program manager, Esther Kim (estherk@princeton.edu) with questions or for application information.

**Courses**

One of the following courses (“Foundations of Statistics”)

- ECO 202 Statistics & Data Analysis for Economics
- EEB 355 Introduction to Statistics for Biology (also MOL 355)
- ORF 245 Fundamentals of Engineering Statistics
- POL 345 Quantitative Analysis and Politics
- PSY 251 Quantitative Methods
- WWS 200 Statistics for Social Science

One of the following courses (“Foundations of Machine Learning”)

- COS 424 Interacting with Data
- ORF 350 Analysis of Big Data

Three of the following courses (including those above, with permission)

Machine Learning

- COS 402 Artificial Intelligence
- COS 401 Introduction to Machine Translation
- ELE 218 Learning Theory and Epistemology
- ORF 418 Optimal Learning

Theory

- MAT 385 Probability Theory
- ORF 309 Probability and Stochastic Systems
- ORF 473 Special Topics in Operations Research and Financial Engineering
- ORF 363/COS 323 Computing and Optimization

Applied Statistics

- AST 303 Observing and Modeling the Universe
- ECO 302 Econometrics
- ECO 312 Econometrics: A Mathematical Approach
- ECO 313 Econometric Applications
- ELE 381 Networks, Money, and Bytes
- ELE 480 FMRI Decoding: Reading Minds Using Brain Scans
- ELE 486 Compression and Transmission of Information
- GEO 422 Data, Models, and Uncertainty in the Natural Sciences
- MOL 436 Statistical Methods for Genomic Data
- ORF 405 Regression and Time Series
- POL 346 Applied Quantitative Analysis
- CEE 460 Risk Analysis

**Example Paths for SML Certificate**

CS Student

- ORF 245 Fundamentals of Engineering Statistics
- COS 424 Interacting with Data
- ORF 309 Probability and Stochastic Systems
- ORF 350 Analysis of Big Data
- COS 402 Artificial Intelligence

ORFE Student

- ORF 245 Fundamentals of Engineering Statistics
- COS 424 Interacting with Data
- ECO 312 Econometrics: A Mathematical Approach
- ORF 350 Analysis of Big Data
- ELE 486 Compression and Transmission of Information

Life Sciences Student

- MOL 355 Introduction to Statistics for Biology
- COS 424 Interacting with Data
- ORF 309 Probability and Stochastic Systems
- GEO 422 Data, Models, and Uncertainty in the Natural Sciences
- MOL 436 Statistical Methods for Genomic Data

Social Sciences Student

- POL 345 Quantitative Analysis and Politics
- COS 424 Interacting with Data
- ECO 312 Econometrics: A Mathematical Approach
- ECO 313 Econometric Applications
- POL 346 Applied Quantitative Analysis

**Important Dates**

This year's poster session will take place on **Tuesday, May 12, 2015.** in the **Carl Fields Center Multipurpose Room** from **12-2pm**.

**Frequently Asked Questions**

Q: I'm a rising junior interested in participating in the statistics and machine learning program. However, I'm a history major, and as such I am not sure how feasible it is to include data analysis into my independent work. Is it possible to work around this requirement or supplement it with something else given my circumstances?

A: No. Unfortunately, you have to use statistics and/or machine learning in your independent work. However, the breadth of these methods have expanded beyond the sciences. For example, the digital humanities makes heavy use of data analysis methods.

Q: To what extent would I need to use statistics and/or machine learning? Would I need to develop my own study/methodology?

A: Statistics and/or machine learning must play a major role in your independent work. You do not necessarily have to develop new methods but you must apply statistical and/or machine learning methods to your question of interest. The independent work is a culmination of your education at Princeton: you need to demonstrate that you can apply methods you've learned in order to answer research questions. We will also have a poster session where you will be presenting your work to SML faculty.

Q: Would taking a more advanced course than those listed in the "foundations of statistics" or "foundations of machine learning" category fulfill the requirement?

A: Yes, that is possible but students must obtain permission of the certificate director. In addition, students are required to take a total of 5 courses in order to be qualified for the SML certificate.

Q: If students are only allowed to take 2 courses from one concentration, then is the example path for CS student's incorrect?

A: So long as students are not using more than two courses towards another degree program (i.e., departmental major or another certificate program), they are allowed to count these courses towards the SML certificate.

Q: If I take a course as a prerequisite for my major, but it does not count as a departmental course or does not count towards my departmental GPA, can I take this as part of the certificate requirements?

A: Yes, a prerequisite course is okay as long as it does not count towards your degree program.

Q: Is PDF acceptable for courses?

A: No. Students must take 5 courses and earn at least a grade of B- for each course.

Q: Does TRA 301 count as a TRA department class or as it's secondary cross-listing, COS?

A: You can count it either way.

Q: Can I present at the poster session during my junior year?

A: Yes, as long as you have fulfilled the independent work requirement and submitted the work to the faculty committee by the appropriate deadline.

Q: Can I submit an independent work during my senior year that is *not* my senior thesis?

A: Yes.

Q: How can I propose a different course for the certificate (for instance, if I am studying abroad and would like a course from the institution abroad to count towards the certificate requirements)?

A: You must submit a PDF copy of the official course syllabus to the Program Manager and the Director of the Certificate Program. The syllabus must include information about the course materials and assignments.

Q: If I get a 5 in an AP class, can that count towards one of the course requirements?

A: No, we do not accept AP courses as exceptions to any of the course requirements.