Data Science of the Facebook World

taliesinb · on April 24, 2013

I did the analysis and worked with Stephen on the science side of it.

If anyone would like to ask questions about what we did, I'd be happy to answer them.

There's still lots more interesting stuff to do, but it was enough for a blog post. Suggest away if you think we missed something obvious!

susi22 · on April 24, 2013

The only visualization I didn't like was the chord diagram: http://blog.stephenwolfram.com/data/uploads/2013/04/chordplo...

Did you try this visualization?: http://bl.ocks.org/mbostock/4062006

taliesinb · on April 24, 2013

Yes, I did.

We'll have to agree to disagree. I think the visualization you linked to is much harder to read, because the visual weight accorded to each edge is a non-linear (and somewhat arbitrary) function of the 'true' weight. It also doesn't scale well with number of vertices.

Whereas with the chord diagram, your eye is naturally drawn to the big arrows, and you can easily follow them. It's also bidirectional in a more straightforward way.

programnature · on April 24, 2013

What took the most work? What was the most painful or repetitive bit? What new primitives in Mathematica would you like to see?

taliesinb · on April 24, 2013

Data wrangling. So I wrote my own "DataFrame" -- we have an official one coming to Mathematica 10, too.

Also, binning. There is a nice theory for multidimensional binning and aggregation [that I haven't seen anyone describe explicitly so far]. So I wrote primitives. They play nicely with plotting, statistics, etc. That'll also be in Mathematica 10.

Lots of Go for data egress. It's perfect for it.

programnature · on April 24, 2013

What can Mathematica bring to the DataFrame concept that hasn't been done before?

Also: why Go instead of a JVM lang that can interop directly with Mathematica via JLink?

Finally: Will Mathematica directly support doing the full stack of this kind of work, including the egress?

taliesinb · on April 24, 2013

1. DataFrames themselves? Well, I think they'll get interesting when they can 'know' about high-level entities like cities, countries, zip codes, ip addresses, etc. Basically, everything that Alpha knows and can compute about, we want Mathematica to know and compute with.

2. I used Go because I am very productive in Go and like a lot of things about it. Goroutines are neat. Java is fine, it's just very boilerplatey, and I'm not practiced enough at it to get past that. And I don't see why we can't develop a GoLink as well.

3. Probably not the whole stack, at least in the beginning. But we'll get there. We want to make it really easy to spider websites and so on.

clebio · on April 25, 2013

How long did the whole piece take to put together, and what's the rough break-down of time spent on each component (data wrangling, finding useful sorts, visualizations, write-up)? Thanks!

taliesinb · on April 25, 2013

Fulltime, around 6 weeks. Breakdown is hard to say.

I wasted a lot of time trying to do things the "traditional" way by loading into SQL, querying, etc, but it was actually much faster to process things in memory (I have a 16 gig machine). Intensive stuff was parallelized in Go and used ordinary filesystem with directory prefix tries for performance.

Writeup was mostly SW. He's worked on it maybe an afternoon a week for the last month.

I really enjoy visualizations and can iterate extremely fast (e.g. ChordPlot took half an hour). Don't know why M is not the defacto standard for dataviz people. Tweaking takes a long time, and design iterated with me on getting things looking really nice.

All in all, most of my time was spent building tools to easily create multidimensional histograms. The nice thing is that those tools are clearly useful enough we'll integrate them into Mathematica, so the cost is somewhat amortized.

NLP took a few weeks of Etienne's time... once again, amortized. Most of that is wrangling, really, and building tools to understand the deficiencies of your training set. Naive Bayes works surprisingly well, the magic is in the tooling and "human intelligence" you iterate with.

szhorvat · on April 25, 2013

Several questions :-)

1. I noticed the interactive slider, embedded into the webpage. That's not what vanilla Mma 9 can do. Is there a simple way we can do the same without the CDF plugin (e.g. a package I'm not aware of) or is this future functionality?

2. The graph with the migration data looks nice. Mma 9 can't do an edge layout like this (curved edges) by default. Is this again custom code (custom Graphics or custom EdgeRenderingFunction) or is it future functionality?

3. There's the part with the frequency of various graph motives (the number of edges, triangles, and other weird shapes in graphs). How did you count these? Was it done using Mma? Some of these motifs are easy to count (there are simple expressions in therms of the adjacency matrix), but some others like the (1-2, 2-3, 1-3, 3-4) subgraph are not so easy.

4. How did you make the word clouds? Is the algorithm written in Mma? Here's a nice but slow one (the worse answer by myself): http://mathematica.stackexchange.com/questions/2334/how-to-c...

taliesinb · on April 25, 2013

1. https://gist.github.com/taliesinb/5464092 . Hopefully we can build this in.

2. Custom Graphics. We don't (yet) support doing anything interesting with graph weights, which this relies heavily on. I think this is a good candidate for M10 -- name would probably be ChordPlot. I came up with a "GraphForm" construct that allows you to patch various graph properties into various visual parameters (size, color, edge weight, etc). That turns out to be quite useful.

3. Tally[list, IsomorphicGraphQ] . Isn't that cool?

4. Awesome! Nice code. We plan to create ImageCloud and WordCloud functions for M10. WordCloud will be specialized for representing word frequencies and so on. ImageCloud will be the general case: accept a list of images [potentially with transparency], and then find a nice layout given desired sizes. So much cool dataviz will be possible with this! Like country flags...

dekhn · on April 24, 2013

how would you effectively compete with distributed computing frameworks such as Pregel, MapReduce, and Dremel, when Mathematica is primarily used as a desktop application for in-RAM datasets? I know that Mathematica supports various parallelism options (such as multicore and grid), but frankly to gather real information requires much deeper probing, which far higher numbers of people, graph clustering/centrality on billions of nodes, edges etc. Mathematica's core routines seem to provide multicore implementations, but distributed algorithms require you to implement your own code on top of mathematica, meaning you'll never see the full performance/behavior of the finely tuned Mathematica im,plementation.

taliesinb · on April 24, 2013

There is already HadoopLink. LibraryLink allows you to write C or C++ that gets dynamically linked into the kernel at runtime (no restart required), which gives you freedom to create your own threads and do your own thing [and crash the kernel]. A lot of kernel development happens that way now.

You can even synthesize C code from Mathematica (there is a symbolic subset of C in it already) and have Mathematica run the appropriate build process for you, so things can get pretty interesting with that alone.

Out-of-core processing of large datasets is already on the roadmap for Mathematica 10. We plan to have a domain-specific language to describe and work with external [or in-memory] datasets in an efficient way, translating as appropriate to the native database query languages. Our 'native' format will be HDF5.

Ultimately, though, I think we'll rely on code generation to compile Mathematica to LLVM or transpile it to Go, so that we can distribute chunks of computation out to a cluster using M as command-and-control.

The idea would be that you can create and test large processing pipelines from inside Mathematica and then distribute them across a cluster in an ad-hoc way, then visualize the progress, track errors, and analyze the results. Notebooks are really good for that kind of lightweight UI.

This isn't a new idea, but in a language as dynamic as Mathematica, I think it could be especially powerful. Of course, it is also tricky because type inference would be a big part of making this idea possible in a dynamically typed, symbolic language like Mathematica. But not impossible, I don't think. And functional languages already have demonstrated advantages in this type of situation -- take stream fusion in Haskell.

szhorvat · on April 25, 2013

The out of code data processing is something that was sorely needed, and I was wishing for this for a long time. One of the big drawbacks of Mathematica for data processing was that it's only convenient to use if all the data can be read into memory.

What you're saying about distributing a computation on a cluster sounds very interesting. I used Mathematica for a hybrid Mathematica/C++ calculation (LibraryLink) where the complexity was handled by Mathematica and the (simple) heavy lifting by C++. I used the standard parallel tools to run it on a cluster, which means that communication was done through MathLink.

I never went above ~70 CPUs, but people say that problems start to appear above that (too many MathLink connection): http://mathematica.stackexchange.com/questions/20356/mathema...

Another possible problem with my solution (LibraryLink, then Mma parallelization) was that it required a Mma license for as many kernels as I was running, even though most of them were only running the C++ code. But that's easy to fix on WRI's side.

taliesinb · on April 25, 2013

Would you be willing to talk to us? Would be interested to hear more about your use case. Hit me up on twitter and I'll DM you my email address.

nswanberg · on April 24, 2013

Did your 2012 Strange Loop talk touch on the technology you used? Do you know if the video of the talk is still going to be released?

https://thestrangeloop.com/news/strange-loop-2012-video-sche...

taliesinb · on April 24, 2013

No, I invented most of the technology over the last month.

The naive Bayes and corpus wrangling was done by my colleague Etienne Bernard: http://www.wolframscience.com/summerschool/2012/alumni/berna...

stared · on April 24, 2013

When it comes to "questions I would ask [the data]": which interests are likely shared by friends? (Can you make a correlation table, or a graph out of it?)

And - is correlations of one's interest vs friends interests the same as correlation of one's interests with itself.

taliesinb · on April 24, 2013

That's a good idea! Though if I remember correctly, interests aren't canonicalized, so it might be pretty messy. And I'm not sure if people fill them in non-ironically any more.

stared · on April 25, 2013

Well, it is always "people how clicked on FB that they like X", regardless whether it is shallow or deep interest, genuine or ironic, or random, or "because my friends like it and I want to be as cool as them" etc.

If more fine grained, I wouldn't be surprised to see ties between seemingly exclusive things... e.g. in this http://meta.stackoverflow.com/questions/157976/map-of-all-se... (not interesting, but participation in particular Q&A sites) "christianity", "judaism" and "islam" are in the same category (as opposed to people apathetic to that topic).

reeses · on April 25, 2013

That's not precisely correct. In the old days, we typed in our favorite bands/books/music. The attempted canonicalization created some amusing interpretations of song as band, book as author, band as book, etc.

It has also made it exceptionally annoying to get feed updates from the 900 bands, actors, movies, songs, etc. that I said I liked and now have to "unlike".

I think you could correlate people within a certain confidence, but because of the nature of the data, you would have to expect a surprisingly large dissimilarity of interests within a clique over a certain account age based on this noise. Not 90%, but higher than the real value.

stared · on April 26, 2013

Sure, I expect data to be extremely noisy, but I am thinking about looking at very robust things (e.g. this guy is in rock-like perhaps-indie music), not making it too fine-grained.

thesnark · on April 24, 2013

Is there a description available of the algorithm used to generate the 'cluster diagrams'?

Really nice work.

taliesinb · on April 24, 2013

Thanks!

Some of the plots use this: http://reference.wolfram.com/mathematica/ref/CommunityGraphP...

The underlying community detection uses: http://reference.wolfram.com/mathematica/ref/FindGraphCommun...

If you look under "Method", there are a bunch of different methods to use that I'm told correspond to various landmark papers in the field. If you know about community detection, you'll recognize which methods correspond to which papers, but if you don't, why do you care? At least, that's our philosophy for documentation, but I'm not sure I entirely agree with that philosophy.

p3r1 · on April 24, 2013

I found the methods very interesting and thanks for the work. Have you thought to add the Surprise method? http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjourna... It appears to solve the resolution limit of modularity.

taliesinb · on April 24, 2013

Thanks, I've forwarded that to our graph algorithms group.

ececconi · on April 24, 2013

This is some really awesome work. How much time do you spend analyzing social network information?

Do you think analyzing information about social networks will bring out real world insights about people and relationships?

taliesinb · on April 24, 2013

Quite a bit -- I've been analyzing my own data for years now.

I did some network analysis stuff like this for Twitter long before it was built into Mathematica (rant: someone at Twitter needs to make a 'graph query' API call so that it doesn't take 3 hours to get a single graph of your own network).

I would link to the relevant posts at taliesinb.net, but Posterous is down 7 days ahead of schedule.

I think it can bring out real world insights. You just have to be very cautious and not leap to conclusions because they seem to tell an interesting story. Although it is somewhat disturbing how "gendered" the wall post topic distributions are.

stared · on April 24, 2013

Are you planning to share some of that data in a reusable format (e.g. csv), not only as end-user plots (nice, BTW)?

It would be great to play with it a bit.

taliesinb · on April 24, 2013

I'll have to check. Certainly there can be no harm in releasing some of the more aggregated data (i.e. that was behind the plots). Perhaps tweet at me so I don't forget -- @taliesinb

stared · on April 24, 2013

Great! Tweeted.

kevinalexbrown · on April 24, 2013

I found this interesting. What I would love to have seen, however, is a probe into the dynamics. You did a nice abstraction over time as you measured property X as age was varied. I would have loved to have seen the manner in which topics and ideas spread over your network.

For instance: If an event occurred in New York, say, how long would it have taken to spread to San Francisco? If there were no progression, topic times would center around the same time. This would indicate that people were getting their information from national, not local sources (e.g. the evening news), then talking about it on facebook. On the other hand, if a local topic was spread on facebook alone, we should see some sort of progression.

It's possible that this progression could take more interesting forms besides geolocation, but that might require a more extensive network. A simple experiment would work like this: A few thousand people who are not friends but have a similar interest (say an interest in Elizabeth Warren) post independently a video of her. This particular esoteric interest is unlikely to be valued a priori by their friends, but perhaps they are compelled to repost the information. What's the threshold of "esotericness" such that it won't "go viral?" Is there a way to predict virality as a function of how popular it is to begin with? Is there no actual progression across the network, but rather a small bump in topic expression, until it is picked up by larger media sources at which point the entire network is inundated with people reposting Elizabeth Warren recaps from HuffPo et al?

The reason this is interesting is that it sheds insight into the role of social networks: are we fundamentally disposed toward central sources like the NYTimes, or is facebook a fundamental sharing mechanism? That is, do I post on facebook just to have my views expressed, validated, and challenged, so that they might change the world over a few years? Or do I post on facebook to have my views propagate across the world much more quickly?

Finally, a question: How did you estimate the power law? I know how difficult it is to do this (e.g. not linear regression on a log-log scale). Did you compare the power law fit to other, similar distributions, like lognormal? Preferential attachment is indeed a beautiful theoretical result, because it implies the existence of power law degree distributions. Unfortunately, many networks are not as well represented by power laws as by alternative distributions, which casts doubt on the preferential attachment hypothesis as is. (Also, many sampling methods give rise to fictive power laws). That said, a fat tail can still be interesting.

In any case, this is a beautiful piece of work.

taliesinb · on April 24, 2013

Interesting points.

1. Dynamics

You're right, that would be very interesting. The most obvious way we could have done this is by looking at the spread of our app itself as people started to use it. Unfortunately, we only started recording anonymized stats for the second release, so we've somewhat missed the boat there.

To do it with links and general "memes" would be technically much harder, because we'd have to periodically rescrape walls of all the donors to see time evolution. It was somewhat out of scope of the blog post, given all the more basic stuff we could do instead.

I'd be surprised if Facebook didn't already do an analysis of this when they "cracked down" on app virality a while back.

Bit.ly's Hilary Mason might have looked at this question too, and I'm sure it has been done to death with Twitter, though the demograph info is much sparser there.

2. This not being a scientific paper, we estimated it by drawing on the log-log CDF. Barring the noise that "deparadoxing" the friend's friend count distribution induces on the low end of the distribution, it was very linear over two decades. We didn't think the exact number was all that interesting, so we didn't spend any more effort than that. Facebook's anatomy paper probably has a very accurate number.

I'd heard about the fictive power law stuff. What makes me even more skeptical is that FB friends are probably a poor proxy for 'true' friends. You'd be better off looking at number of friends as defined by some cross-commenting threshold.

3. Thanks! It was a lot of fun!

szhorvat · on April 25, 2013

About the "fictive power law" thing: this is THE paper to read: http://arxiv.org/abs/0706.1062 (it's an easy read, explaining what the maximum likelihood method is, etc.). Despite what they say, fitting the log-log CDF usually gives pretty good results when done right (fitting the PDF does not)

taliesinb · on April 25, 2013

Thanks! What do you know? Shalizi!

carlob · on April 26, 2013

Also from Shalizi, if you want a TL;DR

http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/491.htm...

taliesinb · on April 24, 2013

Just found this cool paper from Facebook: http://arxiv.org/pdf/1201.4145.pdf

It goes some way to answering the dynamics question.

greiskul · on April 25, 2013

I wonder if the higher friends count of Brazilian users is caused by the previous use of Orkut, where it was popular to try to have as many friends as possible.

taliesinb · on April 25, 2013

Ha! That is interesting! I didn't know about that... will tell Stephen.

geekam · on April 24, 2013

How is it possible that Facebook, which owns the data, does not give tools like these but others tap this using their data?

programnature · on April 24, 2013

The attitude of companies like Facebook, Google, Twitter is: if the product isn't addictive or useful to Billions of people, its not worth doing.

Hence there are vastly more resources dedicated to assimilating eg photos and games into their ecosystem, than into something computationally innovative.

This is IMHO a huge mistake, since they could instead be introducing simple forms of programming that takes you on a continuous curve from using the product, to developing for it. There is a huge hunger in the masses for better forms of programming.

This point will probably become obvious if the Wolfram Language is successful.

taliesinb · on April 24, 2013

I'm sure Facebook's Data Science team does a lot of interesting things internally. They do in fact have some interesting papers [0] and [1], though obviously with more of an 'academic' feel than the blog post.

[0]: http://arxiv.org/abs/1111.4503

[1]: http://arxiv.org/pdf/1201.4145 (edit)

Edit: they also have this FB page which has a steady stream of interesting stuff: https://www.facebook.com/data

p3r1 · on April 24, 2013

Thank you so much. Edit: I don't work at facebook.

taliesinb · on April 24, 2013

Are you on the Facebook datascience team? Your profile is pretty sparse.

mtgx · on April 24, 2013

Imagine how much more of this could be done if they did open their data.

photorized · on April 24, 2013

Data is their biggest asset, they wouldn't want to just shar it freely.

austinl · on April 24, 2013

I've been doing Facebook network visualization for a while now with Gephi. Here are some of the graphs I came up with: http://visualizingpolitics.wordpress.com/2012/05/02/facebook...

taliesinb · on April 24, 2013

Nice! Do you know if Gephi can do something similar to the summarization that we did using cluster diagrams? The whole "ball of hair" problem doesn't have any other real solution, I don't think (well, unless you use edge clustering, but that doesn't help in-group connections).

austinl · on April 25, 2013

I'm 90% certain that it can, but I haven't worked with it in a while. I remember a setting in the display options that simply combined all the dots of one color into one large group.

taliesinb · on April 25, 2013

Cool! I wonder how one could combine the best of both worlds... what we're really talking about here is a hierarchy of graph plots in which you can drill down to each node = graph at a lower level.

sskates · on April 25, 2013

Wow- this is awesome! It's really cool how people's friend distribution by age is a convolution of their age and the age of the general facebook population. It's also scary in a way to see a snapshot of how I'm likely to change in the future with regards to my clusters of friends, my relationship status, and what I'll talk about.

xk_id · on April 25, 2013

The traditional way to plot the assortativity by age is using a scatter plot / heatmap. This is similar to what they did for country homophily on p12 of the Facebook anatomy paper. The result would be a plot with a prominent diagonal, illustrating that "same attracts same".

That aside, imo, Facebook is an incredibly idiosyncratic "app", which makes almost no sense. And yet, it gave us so many opportunities for interesting discussions, like the insights in this blog post. Nice job.

taliesinb · on April 25, 2013

Yeah, we tried a couple. Those heatmaps I think are quite hard to read.. because it is natural to want to take marginals, but you can't easily do that visually.

I think this whole "octile plot" thing turned out quite nicely. It's in a sense a way of 'slicing' the CDF into 8 even strips and projecting them onto a single axis. It's quite intuitive to read, too. Facebook seems to use it too for some of their papers.

photorized · on April 24, 2013

One thing that bugs me is how comments are linked to "interest". There are many topics that interest people (passive consumption), that do not necessarily translate into engaging in a conversation with others publicly.

As a marketing term - sure, that would be a good indicator of interest. Since this article is more scientific than marketing-oriented, I would clarify what some of the metrics mean (or don't mean).

Excellent, fantastic visualizations though!

taliesinb · on April 24, 2013

You're right. A more ambitious thing might get a bit closer to people's "real interests" would be to follow posted links and topic model the contents of those links.

photorized · on April 25, 2013

I am very interested in that angle - let's connect when you have time, twitter @iTrendTV

pbnjay · on April 24, 2013

How much of the friends with zero friends is simply because that information is blocked? If my friends "donated" their data, I would show as having 0 friends if I've blocked that information to apps.

taliesinb · on April 24, 2013

Actually, NONE of the people in our dataset had zero friends. The x-axis starts at 1, not 0. The point is that resampling to remove the friendship paradox shows that there are many more people with single-digit friends than we expected.

pbnjay · on April 25, 2013

Given mcintyre1994's comment, I think this still explains the same situation. People with single-digit friends are simply people who have friends blocked to apps but have multiple friends who've donated data.

taliesinb · on April 25, 2013

No, we can tell when someone's privacy settings have given them a zero friend count, and they're not included in that aggregation.

mcintyre1994 · on April 24, 2013

All, since they're friends with the person who donated the data.

pbnjay · on April 25, 2013

D'oh. Of course you're right, but simply substitute 1 for 0 and my base questions still stands.

CurtMonash · on April 24, 2013

Introducing "data science for Facebook" in 2013 is ... odd.

All the more so because Jeff Hammerbacher is often credited with coining the term "data science", and he started doing it at -- that's right -- Facebook.

taliesinb · on April 24, 2013

Well, the title is "Data Science of the Facebook World", there is no "for". The data comes from our "Personal Analytics for Facebook" product.

Didn't know Jeff Hammerbacher coined "data science" at Facebook -- that's interesting!

brown9-2 · on April 24, 2013

Very nice looking graphs, but running "Wolfram Alpha Personal Analytics for Facebook" for my own profile comes with a rather nerve-wracking warning:

Wolfram Connection would like to access your public profile, friend list, email address, custom friends lists, News Feed, relationships, birthday, status updates, checkins, education history, hometown, current city, photos, religious and political views, videos, likes and your friends' relationships, birthdays, education histories, hometowns, current cities, photos, religious and political views and videos.

dusing · on April 24, 2013

How do you think they are going to get all the data to analyze your facebook account?

taliesinb · on April 24, 2013

Yeah, that's data you see in your report.

You have to opt in to being a data donor for us to store any of it.

Otherwise we just record basic anonymized statistics -- like number of friends, sex, age, etc... and throw all the detailed stuff away. Our privacy policy has more: http://www.wolframalpha.com/fbfaqs.html

We also encrypt with public keys like there's no tomorrow.

brown9-2 · on April 24, 2013

I didn't realize going into it that the report would be that detailed. However, having opted in, I'm definitely impressed by what you get out of it.

jonpeda · on April 24, 2013

The Mathematica system makes some beautiful, informative graphs, and presumably users can make those graphs with a minimum of fuss and bother. It's technically very nice.

Yet, in the entire blog post, is there one insight that wasn't a priori obvious? Maybe the bits about migration.

I don't see the "art and science" in this analysis, I see "stamp collecting" (http://en.wikiquote.org/wiki/Ernest_Rutherford)

programnature · on April 24, 2013

The progression of interests over time was non-obvious to me. For instance the explosion of interest in travel in the 20s, and the temporary dip in interests like philosophical quotes also in the 20s.

Theres a lot of stuff in the post, I wouldn't dismiss it because its TL;DR

carlob · on April 24, 2013

So Rutherford says: "All science is either physics or stamp collecting" and then wins the Nobel prize for…

You got that right: chemistry!

rdouble · on April 24, 2013

Chemistry is applied physics, after all.

carlob · on April 24, 2013

I'd rather imagine him fuming in one corner: "stupid Nobel committee… they should have kept their crappy stamp collecting prize…"

Create · on April 25, 2013

"Scientists study the world as it is, engineers create the world that never has been ." Theodore von Kármán

Create · on April 24, 2013

The LHC was built as a higgs factory. Or a stamp printing press, if you prefer.

btw Mathematica would be little more than a DOE-MACSYMA/Maxima ripoff, not to mention the delusional "NKS" including its author.

jonpeda · on April 24, 2013

People donate their data to support Wolfram's closed-source, paid-license, for-profit program?

mcintyre1994 · on April 24, 2013

They said they'll use it to support their 'personal analytics' programme [0], which is free via wolframalpha.com - I don't see how this data would help with Mathematica or anything else they charge for?

[0] http://www.wolframalpha.com/facebook/

maxerickson · on April 24, 2013

They are using it for marketing.

(I mean the post is what it is, I don't mean to whine about it)

programnature · on April 24, 2013

Wolfram must have have built something people want, are willing to pay for, and even like ?

susi22 · on April 24, 2013

Even though WA is closed source and for-profit, I few them (and the company) as a kind of fellow scientist and not the evil big corporation (like oracle, microsoft etc.).

It all depends who ask for my data.

stared · on April 24, 2013

It is data already on Facebook, so why are you so picky with sharing it further? (Or rather - criticizing people who do so.)

Especially, as one can get some nice plots back (also - personal).

qq66 · on April 25, 2013

People donated their data so that a blog post like this could be written. Mission accomplished, I guess.

hammock · on April 24, 2013

It's taking advantage of non-public Facebook data, which is closed anyway. So, your data is now being used by two places instead of one. More open.

misiti3780 · on April 24, 2013

i was thinking the same thing ..