This month, we were delighted to release preliminary findings from our yearlong investigation of the first edX class, 6.002x. The findings appeared in the online journal “Research & Practice in Assessment” (RPA) in its special summer 2013 issue on MOOCs. You can download our paper here. From the birth of edX, it has been clear that the institutional mission included research into learning and using findings from research to inform improvements in digital learning.
In the study, we outline the investigation we have undertaken as part of a multidisciplinary, cross-institution team in order to make sense of the massive quantities of rich data from 6.002x. Our work is funded by a one-year NSF grant (information here). We address four basic research questions: who are the 6.002x students, what behaviors and background factors predict achievement and persistence, how do students form groups and interact with each other, and what is the 6.002x experience for residential MIT students? We hope this foundational work on the first edX class will not only serve as a basis for more complex investigations but also help us to track the ongoing development of how MOOCs are used.
The paper is certainly a result not only of important complementarity within our research team but also collaborative partners at edX; we would not have been able to work with the large and rich dataset of 6.002x without the support and openness of the edX team. The data we used in this study come from multiple sources, including, most importantly, the clickstream logs from 6.002x students. Using these logs, we were able to match student IP addresses to their approximate location using a geolocation database. (And, we were happy to see that, while we were data crunching over at edX, this feature was added to the edX analytics platform!) In our study, we also used the almost 100,000 individual posts from the discussion forum stored in the SQL database.
To augment the data generated by students’ interactions with the website, we worked with the edX team to create and disseminate an end-of-course survey. This generated valuable information on the responding students, many of whom were certificate earners, but some of whom did not earn any points in the course. After 6.002x, we continued to work with the edX team to revise and improve this survey to augment our ability as researchers and instructors to learn about students who take MOOCs and understand their experiences. Thanks to our productive collaboration, this survey has now grown to an entrance survey offered on several of the edX platform courses. In the RPA paper, we detail some of the results from the first exit survey.
Screenshot of entrance survey for Spring 2013 7.00x class
The RPA paper begins, though, by describing 6.002x and its structure for readers who might not be familiar with it. We then highlight findings from initial work done by the RELATE (Research in Learning, Assessing and Tutoring Effectively) group, looking at the ways in which certificate earners allotted their time on the class website and what resources they used in particular when completing homework or exams. Among other findings, RELATE’s work shows students demonstrate a marked periodicity in using the site, corresponding to when the homework assignments were due and when the exams were given.
“Who are these students?” is one of the very first questions instructors, course designers, or researchers want to know when they are confronted with a MOOC class. The next section of our paper gives a peek at the identities of the almost 155,000 students who enrolled in 6.002x.
We approximated student locations from their IP addresses. (We believe the error due to proxy servers is less than 5%.) Students logged in from 194 countries, virtually all of the countries the world. The top five countries are the United States (26,333), India (13,044), the United Kingdom (8,430), Columbia (5,900), and Spain (3,684).
Global distribution of students using the site
Of the students who responded to the survey question asking about age, most were in their 20’s and 30’s, but the range of ages ran from early teens to users in their seventies—an unprecedented level of age diversity for a classroom! The overwhelming majority of survey completers were male. However, in our predictive modeling, gender has no significant relationship to achievement.
That predictive modeling makes up the bulk of the remainder of the RPA paper. We examine the simultaneous interplay between time spent on different resources for the course (e.g., homework, e-text), a student’s first homework score (as a proxy for “prior ability”), student background characteristics (e.g., self-reported language and country) with success in the class.
Time spent on certain resources was strongly related to higher achievement—time on homework assignments, for example, was strongly positively correlated with higher scores in the class. We also found that students who reported working together with another student offline were predicted to do better in the course.
One of the more troubling aspects of MOOCs to date is their low completion rate, which averages no more than 10%. As the graph below shows, about half of the original enrollees in 6.002x stopped out after one week in the class; students then stopped taking the course at a fairly consistent rate throughout the next thirteen weeks. We conducted survival analyses to understand what the overall stop-out profile looked like and, further, to understand what factors might contribute significantly to students’ likelihood of stopping out.
Stop out rate of students throughout the course.
Participation in interactive learning communities is an important instructional component of MOOCs, and discussion boards provide a unique, fine-grain window into how students interact with each other virtually. As the graph below illustrates, we know that, on average, only 3% of all students participated in the 6.002x discussion forum. We are in the process of coding different types of posts to understand the nature of student discussions.
Distribution of students with 0 – 100 activities on the discussion board
Our forthcoming work, foreshadowed in the RPA paper, will delve further into predictive modeling of student achievement in the class as well as student persistence. As educational researchers and instructors know, many factors interact with each other to create a student’s experience in a class, including curricular materials, pedagogical methods, and the students’ attitudes and beliefs, prior experiences, and language abilities. This complex interrelationship is challenging territory and the focus of our further modeling efforts.
As noted above, we are continuing to mine the discussion forum. Not only do we want to explore the nature of the students’ posts, but we want to understand if and how a social network among the students evolved. In addition, we have been able to identify approximately 200 MIT students who signed up for 6.002x. We are in the process of gathering qualitative information on their experiences to inform the intersection of online and residential learning. Because our team has experience and interests in many areas of education research—statistics, data mining, psychometrics, communication, policy, and international education, to name a few—the RPA paper and our forthcoming work highlight many different interesting questions that we can begin to answer with the rich data provided by MOOCs.
Before the RPA paper was released, our diverse team presented initial findings to edX as a “debriefing.” This format allowed us to share our research results much more quickly than the traditional process of academic publication allows. More importantly, this interaction with edX provided us with direct feedback on our work. As researchers outside of the edX organization, it was a unique opportunity for our team to talk with edX staff members who are responsible for developing and implementing the edX courses. Together, we can identify ways to improve the student and teacher experience in MOOCs. It is rare that rigorous research can have such an immediate impact on practice, and we look forward to sharing our next set of findings as soon as possible.
Our team brought together researchers with varied backgrounds but a shared interest in understanding the complex teaching and learning environment of MOOCs. The RPA publication is a testament to the collaborative energy and benefit of combining complementary expertise from online learning design, data mining, statistics, and, of course, education research.
TLL postdoctoral associate for education research
TLL began in 1997 as a resource for faculty, administrators, and students who share a desire to improve teaching and learning at MIT. Our goals are to strengthen the quality of instruction at the Institute; better understand the process of learning in higher education; conduct research that has immediate applications both inside and outside of the classroom; support assessment and evaluation efforts; serve as a clearinghouse to disseminate information on national and international efforts in science and engineering education; and aid in the creation of new and innovative curricula, pedagogy, technologies, and methods of assessment. Follow @mit_tll for more updates on our research.