A question of central importance to MOOCs that we are trying to answer with the help of big data and machine learning is: who is likely to drop out of a MOOC and why? Put another way, what predicts whether a student will make it to the final week of an edX course? (We call drop out “stopout” in all our work because dropout is conventionally perceived negatively. In contrast stopout represents a pause in engagement, rather than abandonment.)

We develop machine learning models that use longitudinal variables to express possibly predictive behavior for each student, week by week (i.e. per module). In the study outlined here, we have prototyped our modeling methodology, (documented here) by retrospectively examining and testing it with data from edX’s very first course, 6.002x of Spring 2012. This landmark offering had remarkable numbers – 154,763 students registered, contributing to >200 Million events and 60 GB of raw click stream data with a high absolute number of students earning a certificate, 7,157, but relatively speaking, fewer certificate earners (less than 5%).

Previous work with another MOOC on another platform had analyzed forum posts to predict drop out. In order to maximize the number of students included in the study, we challenged ourselves to predict drop out for students who attempted at least one problem regardless of whether or not they posted on the forum and/or wiki. We challenged ourselves to predict drop out for students who attempted at least one problem regardless of whether they used the forum and/or wiki. We formed four different cohorts of students: learners who participated in forums (discussion generators), learners who edited wikis (content generators), learners who did both (fully collaborative) and learners who did neither (passive collaborators). Students in the “No attempts” cohort never submitted an assignment so, by our definition, are considered to have dropped out in week one.

We relied upon data processing 130 million click stream events which reported interaction with assessment components (homeworks, quizzes, lecture exercises) and resources (such as videos, tutorials, labs etc) for 52,000+ students. For any given week j in a 14 week course there are 14−j number of prediction problems (see figure below). Each prediction problem requires an independent discriminative model. In our study we tackled learning each of the 91 models for each cohort. Others, have formulated just a fraction of these problems and have used a modest number of predictors, whereas in this study we tackled learning each of the 91 models for each cohort, using ~5 times more predictors (27 predictors with weekly values). See Table 2 and Table 3 here for a complete list.

### Key Findings:

Our study offers answers to many faceted versions of the original question.

Some examples are:

**Q. After only the first week, how accurately can we predict who will drop out by the final week?****Q. Across different cohorts of students what is the single most important predictor of dropout?**- Answer: A predictor that appears among the most influential 5 in all 4 cohorts is the “pre-deadline submission time”. It is the duration between when the student starts to work on the problems and their deadline. Perhaps it is indicative of how busy the learner is otherwise.

**Q. What predicts a student successfully staying in the course through the final week?**- Answer: A student’s average number of weekly submissions (attempts on problems) *relative* to other students’, e.g. a percentile variable, is highly predictive. It appears that relative and trending predictors drive accurate predictions. A student’s lab grade each week is more predictive than a count of his/her problem submissions.

**Q. How far back in the course do we need to look to predict students future state?**- Answer: In general, only the most recent four weeks of data are needed for almost every prediction week.

**Q. For the students who participated in the forums, did any of the predictors extracted based on their behavior in forums matter?**- Answer: Except for the length of a student’s forum posts, no other forum related variables appeared in the top 5 predictors. The average length of discussion posts is predictive, but the number of discussion posts and responses are not. Perhaps this indicates it is the content of their posts rather then the number of posts that predict their persistence. The number of wiki edits has almost no predictive power.

**Q. When can we predict well?**- Answer: It is easy to predict accurately 1 week in advance. In general, when predicting one week in advance, our models averaged ~0.88 AUC accuracy. Our week 7 model predicting week 8 dropout has an AUC = 0.95 for the fully collaborative student cohort. Its high accuracy is likely due to the data on mid-term participation during week 8. For the forum contributor cohort, our week 8 models predicting week 9 and week 10 dropout were also highly accurate. Here the data supported just slightly lower AUC (0.87).

**Q. Do more predictors help?**- Answer: Yes, our 27 predictors helped us achieve AUC accuracies in the range of 0.88-0.95 for 1 week ahead prediction. Previous similar work with 4 predictors achieved no better than AUC of 0.7. Our better accuracy is likely both due to the better predictors as well as the variety of machine learning methods we drew upon.

More findings appear in paper here and Colin Taylor’s thesis.

### A Few Details on Our Methodology:

It was essential to be sensitive to the high likelihood that predictors and predictive accuracy would likely depend on a student’s level of engagement with the course. After dividing students into cohorts depending on whether they used the forum or the wiki, both or neither, we then followed three steps:

- Prepare a large set of potential predictors: We used a powerful new approach in which we ask instructors and students as a “crowd” to help us (see paper here) define predictors. We then extracted 27 in total. They covered sophisticated, interpretive aspects of student usage patterns and were drawn from different data sources.
- Generate machine learning models using cloud computing and a multi-parameter, multi-algorithm machine learning framework (Delphi): We employed a plethora of machine learning models starting from logistic regression, to more complex discriminatory models- support vector machines, random forests, decision trees, and time series latent variable probabilistic models that pay attention to temporal dynamics (see here and Colin Taylor’s thesis plus Figures 3-A and 3-B). We derived over 10,000 comprehensive, predictive models using this set of state-of-the-art techniques.
- Use a statistical resampling approach to identify variable importance: We teased apart the role of each predictor in predicting dropout a few weeks ahead (see here) .

Figure 3-A. Logistic regression results for the passive collaborator cohort. This was by far the biggest of the four cohorts.

Figure3-B. Logistic regression results for the forum contributor cohort.

### What next?

In the realm of drop out prediction, this methodology is very thorough (in fact, we computed >70,000 models for 91 versions of the drop out problem and 4 cohorts, counting cross validation) but we still are hungry to answer more questions. These future questions are of even larger scale:

- Soliciting variables from instructors, platform providers, designers, researchers and students has turned out to be critical. They are more likely to propose hypotheses worthy of validation or offer interpretations that will enhance the understanding of why students dropped out. How can this particular crowd be adroitly accessed and marshaled to contribute their valuable insights?
- While we identified variables that mattered in one course, do they transfer? Would the same variables be highly predictive when we move to a subsequent offering of 6.002x or other engineering courses? What sort of differences could we expect to see in terms of their predictive value if we were to analyze courses from other disciplines?
- Finally, how can we bring our predictions to practice? How can modeling and predictive information be exploited to intervene and help an at risk student?

Guest Post: Kalyan Veeramachaneni and Una-May O’Reilly, MIT Computer Science and Artificial Intelligence Laboratory

### About us:

Doctors Veeramachaneni and O’Reilly are members of the ALFA (AnyScale Learning for All) group at CSAIL, MIT. A significant portion of this work was part of the thesis by Colin Taylor. Other activities include developing enabling technologies for MOOC data science; supporting learning science and learning analytics via massive scale knowledge mining, machine learning, prediction and behavioral modeling. The group is designing platforms that enable peer networks to share MOOC related software, visualizations and crowd sourcing for features.

## Recommended Posts

### Unlock the Power of Language Skills

21 Jun 2016