MIT and Harvard release de-identified learning data from open online courses

minimaxir · on May 31, 2014

Working up a few quick statistics:

- Of the people that finished a course, male students had an average grade of 83.8%, female students had an average grade of 82.7%.

- The correlation between the grade of students who finished a course and the usage of supplemental materials is positive, but weak (0.16 for each variable, except for forum posts, which are complete uncorrelated with grade).

- The easiest course was HarvardX/CS50x/2012, with a perfect 100% average grade from all students who finished the course [EDIT: this course has pass/fail assignments]. The hardest was HarvardX/CB22x/2013_Spring, with an average score of 73.3% (the class, unsurprisingly, is about Ancient Greek Heroes)

- The course with the highest completion rate (# completed / # registered) is MITx/14.73x/2013_Spring at 7.48% (The Challenges of Global Poverty). The worst completion rate is HarvardX/CS50x/2012 at 0.07% (Introduction to Computer Science I, note that this is also the most registered course by a large margin)

- All classes have more male students than female students. The class with the highest Male/Female ratio is MITx/2.01x/2013_Spring at 17:1 (Elements of Structures). The class with the lowest ratio is HarvardX/PH278x/2013_Spring at 1.03:1 (Human Health and Global Environmental Change)

Let me know if you have any statistical ideas.

mtVessel · on May 31, 2014

Your gender-related analysis is a bit incomplete. You should state that you're only including records for which gender is specified.

- Overall, 20.4% of the records have an unknown or unspecified gender, for which the average grade is 84%.

- Every class has between 6-29% with unspecified genders. In four of the courses, there are more unknowns than the difference between male and females.

A more interesting analysis (to me) is completion and grade vs. engagement, which they're clearly trying to track by including data about student events. I'm not sure how to approach this. On first glance, it seems that for some courses, it's equally likely a student in the top 10% of engagement (going by # days student interacted with site), will have achieved either a spectacularly high or low grade.

Maybe it's time I take one of those Data Science courses, myself.

watwut · on May 31, 2014

On gender: are those meaningful differences? Does it make sense for me to ponder about why those with unspecified gender have better results?

nl · on May 31, 2014

Not really. The differences aren't very large, and there are pretty obvious causes.

Clearly the big variations in gender enrollment rate (and the likely different pass rates) and presumably the different pass rates is almost certain to be the biggest cause.

minimaxir · on May 31, 2014

A 1% difference is not likely to be statistically significant.

cpks · on May 31, 2014

Regarding easiest vs. hardest, while I'd tend to agree with the conclusion, the logic is just plain wrong. Final course scores depend more on grading structure than anything.

Many of the best courses use mastery learning. You can keep trying until you get it. This usually leads to 100% averages, actually quite well-deserved, since all students learn everything.

Heroes was a course which encouraged different levels of participation. It's probably the best MOOC ever run, but explicitly designed so you'd get out what you put in, and to accommodate all levels. Many students chose to put less in.

minimaxir · on May 31, 2014

The course has 1287 graduates, all with a perfect score, so I believe your conclusion is correct. Edited OP.

ghaff · on May 31, 2014

Opinions differ I guess. I signed up for that class with a fair bit of anticipation and couldn't make it through the first lecture. It just didn't engage me at all. But a lot of people apparently felt differently.

danso · on May 31, 2014

FYI, just in case you don't want to go through the (short) signup process to see what's in the data.

This is what the first 10 lines of the CSV file look like:

https://gist.github.com/dannguyen/4d372986b74ff087927a

Here's the output of `wc -l`

     641139 1092627 70165566 HMXPC13_DI_v2_5-14-14.csv

Here's a mirror of the file [9.6MB]: http://danwin-files.s3.amazonaws.com/data/nf/HMXPC13_DI_v2_5...

svedlin · on May 31, 2014

There's been a lot of discussion about the "completion rates" in online courses. A study in January reported that "only 4 percent of people who register for MOOCs actually finish them" but that "MOOCs still have considerable impact" because "nearly two-thirds got at least something out of the experience." [1]

One possible solution to this - instead of measuring progress against a single completion date, split courses up into a continuous series of milestones or smaller units. A student could cover 1 or more units.

While it's true some courses are considered prerequisites for others, the requirements could be made more granular (e.g.: course B unit 4 requires course A units 5 and 6). Discovering these dependencies could potentially be automated by text data mining the course material.

Courses are designed around a traditional college semester and reflect the amount of material that can be reasonably covered in that time period. However, that constraint shouldn't necessarily be the benchmark for all study programs.

[1] http://hechingerreport.org/content/harvard-mit-despite-low-c...

contingencies · on May 31, 2014

My problem with MOOCs is immediacy: I'm interested now and I don't want to delay that over x weeks, beginning at y point in the future. Hell, I often don't know what country I'm going to be in, let alone whether I'll have free time and an internet connection. So for my lifestyle, the relatively simple mapping from traditional tertiary course formats across to MOOCs is fundamentally flawed one, though I believe they are improving these days by offering access to all materials immediately. Another thing is downloads ... I just want everything, please. I'm often offline, as I believe many developing country learners may be. I don't want to have to register, log in, then painfully click through everything bit by bit. I want bittorrent with early-to-late material download priority. (Otherwise, maybe someone should start developing a converted and open courseware format to share on PirateBoxes? http://piratebox.cc/)

incision · on May 31, 2014

I'd wager that you're in the small minority here.

I've gone through a number of MOOC courses and structure, schedule, TA interaction and shared experience seem pretty important to most enthusiastic / best performing course takers.

Some courses, like CS50 are already completely self-paced. Others run quite frequently, restart every month.

Being able to have it all in terms of TA/instructor support and an arbitrary schedule seems to be the direction Udacity is going with subscriptions.

I do agree that being to easily grab some/all course materials at once would a nice option. However, I've completed a majority of the work for my courses offline and only one had its coursework structure in a way that necessitated a frequent connection to complete.

adricnet · on May 31, 2014

Well, the problem there is a lack of available $QUALITY courseware content as there are tools to do this sort of thing (Moodle, XO Server) and free content (moodle.net, Khan Academy, scorm modules) available in small quantities for some topics and languages.

I would advance the theory that the art and science of developing useful and effective courseware is new and we (humans) haven't at all figured out how to do it yet.

Various people I know are exploring this from different directions with their projects, EG:

* AST is running with the free BBST courses

* while the original authors of that free content have built a company around expanding and maintaining the curriculum commercially.

* a friend is putting together work for hire 1000x level class material as part of her instructor position

The economics of motivating class creation, much less improving $QUALITY or availability under free culture licences are ... well quick the thorny thicket.

ghaff · on May 31, 2014

I think a lot of people find the structure helpful (and it's the only real way to make things like discussion boards work). That said, it's really a tradeoff. Case in point, there's a sabermetrics course running that I'm sort of interested in and, had it started when scheduled a month ago, the timing would have been pretty good. As it is, I'm about to take off on a lot of travel and it's probably not realistic to do anything other than audit the couse.

hershel · on May 31, 2014

Does anybody have anidea what's the completion rates for students who paid(for certification/something else) ?

incision · on May 31, 2014

I don't recall if either identifies that information specifically, but there's a whole collection of browsable stats and collected papers here [1] and another paper here [2].

I expect any certificate related stats will change significantly in the near future there's as each MOOC seems to be de-emphasizing or getting rid of 'honor' certificates.

1: http://odl.mit.edu/insights/

2: http://www.rpajournal.com/dev/wp-content/uploads/2013/05/SF2...

ghaff · on May 31, 2014

That's my perception as well. I've noticed some edX courses in particular pushing pretty hard on verified/paid certificates and eliminating the option of a free certificate. I can't say I'm especially surprised. MOOCs have struggled with their business models and having a free certificate option just further reduces the incentive for people to pay for a verified certificate. I notice that the upcoming Linux Foundation edX course, for example, only gives options to audit and to pay for a verified certificate.

ghaff · on May 31, 2014

The only numbers I've seen are from the Spring 2013 Gamification course. 68% of the signature track students earned a verified certificate. (That's not quite what you asked but it's probably the only meaningful way to measure completion.) Other statistics associated with the class were fairly typical of MOOCs so I assume that's a reasonable ballpark to assume for MOOCs generally.

cpks · on June 1, 2014

I've done a few verified certs. There's a question of causality. I only get on the verified track after I'm relatively confident I'll finish the course. Those are the same courses I would have gotten an honor code in if verified wasn't available.