MSCF Machine Learning Capstone Project
Connect directly with the next generation of quant finance talent!
Sponsoring a project for the MSCF Machine Learning Capstone Project Course is a special opportunity to partner with our students. The Capstone Project course itself builds on concepts and skills taught in MSCF’s five-course data science curriculum. Second-year MSCF students work as teams on data mining, modeling and visualization techniques along with statistical and machine learning methods to address real-world challenges corporate sponsors select in coordination with faculty. Projects conclude with a final report and presentation to company sponsors.
Get Involved
Features of a Capstone Project:
- Companies can propose projects in coordination with MSCF Faculty
- Teams of 4 - 6 talented masters' students, supervised by a MSCF faculty member
- Submit proposals in the Spring, projects last 14 weeks over the Fall Semester
- No Fee—Gift instead. We invite our capstone sponsors to make a philanthropic gift in support of the MSCF Fellowship Fund
Commitments from the sponsor:
- Propose a project
- Identify internal company project manager
- Ensure availability of data (internal and/or public)
- Sign CMU Educational Partner Agreement
- See Project Guidelines Below
Project Guidelines
Sponsors benefit from having a group of strong, motivated students working on a project of interest. While the primary objective of the project is educational, and MSCF cannot make any guarantee of specific outcomes, the experienced students who comprise these groups are capable of a thorough exploration of an appropriately-scoped project during the 600 to 900 person-hours that are devoted to the work.
Project Topics
Perhaps the most common questions from potential sponsors are in regards to the choice of a project, i.e., What is an appropriate project for this course? The short answer is that there is a great deal of flexibility, since machine learning is playing an increasing role in all aspects of finance. An appropriate project will, of course, involve data, along with a substantive question that one hopes could be addressed using those data.
Some projects have a prediction/forecasting (supervised learning) goal, and in those cases there should be a suitable training set available, i.e., data for which the true value of the target variable is known. Examples could include predicting reactions to the announcements of earnings reports and using 10K filings to forecast events such as dividend cuts.
Other projects are unsupervised tasks, i.e., the objective is exploration of a data set to find interesting features, anomalies, or errors. Examples include working to develop stock clustering algorithms motivated by a particular objective, or searching through transactions in an effort to identify unusual, previously unseen, behavior.
It is best if the data set to be utilized is neither too small nor too large, in order to enable sufficient progress during the semester. The standard for what is too small or too large will be quite context-specific, however. It is worthwhile to think carefully about how much data is really required in order to achieve useful results. For example, if the group is able to reach a good outcome on one type of transaction data, the sponsor might be able to then extend that work to other types. Maximizing the time the students can spend exploring different methods, and minimizing the time spent on processing massive data sets, is beneficial to students and sponsor alike.
Projects that involve the use of non-standard data, e.g., images or documents, are welcome and can be particularly interesting for the students.
Sponsors may find this to be an opportunity to consider projects which may be “high risk,” in the sense that they may be more speculative in nature. This could be viewed as an opportunity to see if more sophisticated methods of data analysis hold promise for a particular problem faced by the sponsor.
The course instructors are happy to work with sponsors to refine a project idea so that it fits well into the course.
Student Preparation
Students participate in this capstone experience in the final semester of the three-semester MSCF program. As a result, students enter the project with a strong foundation in the skills necessary for successful quantitative finance work. The curriculum of the first two semesters of MSCF Machine Learning Capstone Project Course, Information for Sponsors 3
the MSCF program can be broken into six topic areas, each of which receives roughly the equivalent of 1.5 semesters of coursework:
Finance - Courses cover the valuation of a wide range of traded equity, bond, and options, including essential models, theories, and formulas while incorporating important institutional and empirical facts about the markets.
Statistical Methods - Topics include model fitting and assessment, covering models ranging from univariate distributions to linear regression to advanced time series approaches. Coursework also covers the proper implementation and usage of simulation tools for cases where analytical models are not tractable.
Machine Learning and Modern Data Science - Courses cover not only modern methods utilized in machine learning, but also focus on building a strong understanding of the underlying principles so that students develop the ability to adapt and extend these approaches to challenging problems encountered in the future. Supervised and unsupervised learning are stressed. There is also focus on data cleaning and visualization tools.
Stochastic Calculus - Three courses cover the mathematical theory, and applications, of risk-neutral pricing, including topics such as fundamental theorems of asset pricing, Brownian motion, Ito integrals and Ito’s formula, in both the univariate and multivariate case. A detailed exploration of the Black-Scholes model is included.
Computing - Courses cover the use of Python and C++, with focus on both computer science fundamentals and the implementation of algorithms important in quantitative finance.
Communications - Students receive instruction in making formal presentations and handling interviews, along with coaching in informal interactions/networking. Cultural influences that affect business communications are stressed.
Students are also required to complete an internship during the summer following this first year of coursework, providing them with an opportunity to further expand and apply the knowledge gained during this period.
A more complete overview of the curriculum, including course descriptions, can be found here.
Educational Partner Agreements
Sponsors and students both sign Educational Partner Agreements (EPAs), legal documents that specify the terms under which the work of the project will be conducted. CMU’s legal team assists in the execution of the EPA.
From CMU’s perspective, the purpose of the agreement is to protect the rights of the students, and to ensure that the educational goal of the project is clearly specified. Certain items can be negotiated with the sponsor, while others cannot.
Template agreements are available on request.
Sponsor Privacy
The MSCF program takes sponsor privacy seriously, and will adhere to the requests of the sponsor, provided these do not violate any of the legal requirements set out by CMU.
The following are questions that the sponsor should consider:
- Can group members discuss their work with other MSCF students not participating in the project? O ther students within the program would benefit from hearing about the ongoing work, and the program would like to enable this to the extent possible.
- Can the MSCF program use the sponsor’s name in external advertising of the program and/or capstone course? If possible, the program would like to advertise to prospective students, and to prospective capstone sponsors, the identities of the sponsors who are involved in the project course. The capstone course is a major selling point for attracting students into the program.
- Can the MSCF program use the sponsor’s name internally? For example, there are often internal discussions amongst faculty and staff regarding the status of the capstone course, and questions regarding the participating sponsors often arise.
- What are students allowed to write on their resume and speak about during interviews regarding their involvement? Students who participate in these projects naturally want to point to the experience when pursuing job opportunities. We ask that sponsors develop a clear statement as to what description they are comfortable with students including on their resume and/or speaking about during interviews.
Logistical Details
Important Dates
(subject to change)
Signed EPAs Due from Sponsor | 8/2/21 |
One-Page Project Description Due from Sponsor | 8/16/21 |
Data Due from Sponsor | 8/23/21 |
First Day of Semester | 8/30/21 |
Labor Day | 9/6/21 |
Project Teams Assigned | 9/7/21 |
Students Sign EPAs | 9/8/21 |
First Sponsor/Student Meetings | 9/9-10/2021 |
Midterm Exam Break | 10/17-26/2021 |
Preliminary Report Due from Students |
11/1/21 |
Thanksgiving Break |
11/24-26/2021 |
Final Presentations and Reports |
12/13-17/2021 |
Sponsor Feedback Due |
12/20/21 |
As seen below, the groups will have the equivalent of eleven weeks to devote to work on the project. Each student is expected to devote approximately 150 hours to the project.
Week of... | Activity |
Week 1 |
Students learn about projects and express preferences |
Week 2 |
Groups assigned and initial meetings |
Week 3 | Project Work |
Week 4 | Project Work |
Week 5 | Project Work |
Week 6 | Project Work |
Week 7 | Project Work |
Break | Break Week for Exams |
Week 8 | Project Work |
Week 9 | Project Work |
Week 10 | Project Work |
Week 11 | Project Work |
Week 12 | Project Work |
Week 13 | Project Work |
Week 14 |
Final presentations and reports |
Sponsor/Group Meetings
Sponsors should plan to meet with their student group at least once every two weeks. Some sponsors have found that weekly meetings, especially at the start of the project, are useful. These meetings will be jointly scheduled by the students and sponsor, at a time that works for everyone. It is important that all students in the group attend these meetings.
The purpose of these meetings is for the students to provide an update on recent work and to get answers to questions. The sponsors can use the meetings to provide direction to future work. Sponsors are asked to avoid any activities that may be perceived as “evaluations” of the students. Sponsors should not “assign” work to the students.
Sponsors should be aware that students often interpret a request to make a presentation of their work as a request to make a formal presentation. The risk is that students end up allocating too much of their limited work time to polishing a presentation. For this reason, sponsors are asked to limit their expectations in this regard, and to communicate expectations clearly.
Assigning Students to Projects
The course instructors will assign the students to projects.
In the first week of the course, students are provided with anonymized one-page descriptions of each project, and are asked to express their preferences among the projects, and also to make statements as to why they are, or are not, well-suited to a particular project. The instructors will take these comments into consideration, but will also weigh factors such as achieving a balance of skills and experiences within the project groups. Students only learn of the identity of the sponsor after they have been assigned to a group.
Data Availability
In general, sponsors are expected to provide all data necessary to perform the work on the project. Sponsors should not expect that CMU will incur the expense of obtaining data.
CMU does have access to a few standard databases, including some components of the Wharton Research Data Service (WRDS), along with Bloomberg Terminals. (As an example, we have very limited access to options data.)
Sponsors should have all of the data available to the course instructors prior to the start of the project.
Course staff will work with the sponsor to determine the best way to securely transfer data from the sponsor to the students. Previous projects have used an SSH File Transfer Protocol (SFTP) server and Box.
Midterm Exam Break
Student Assessment
The course instructors will handle course grading, and sponsors are not responsible for the assessment of student performance on the projects. Sponsors are, however, encouraged to share any concerns or compliments during the semester, and will be provided with the opportunity to provide feedback at the conclusion of the project.
Sponsors can also use this as an opportunity to network with students who could become future employees.