More

jonathanmayer · on March 31, 2023

Context: I teach at Princeton and study social media and recommendation systems.

From a very quick skim of the repositories, this appears to be quite limited transparency. The documentation gives a decent high-level overview of how Tweet recommendation works—no surprises—and the code tracks that roadmap. Those are meaningful positive steps. But the underlying policies and models are almost entirely missing (there are a couple valuable components in [1]). Without those, we can't evaluate the behavior and possible effects of "the algorithm."

[1] https://github.com/twitter/the-algorithm-ml

eterevsky · on April 1, 2023

I work on Google Assistant Suggestions and I don't think it's very practical to open-source an algorithm like that including the models and the underlying data. Both of them can live in separate services and be frequently updated.

I am assuming that open sourcing the code aims to increase transparency about the business logic of the ranking decisions. At the same time you don't want spammers to be able to easily run experiments against a cloned version of your system.

bilekas · on March 31, 2023

> But the underlying policies and models are almost entirely missing (there are a couple valuable components in [1]). Without those, we can't evaluate the behavior and possible effects of "the algorithm."

Haven't gone through yet, but yeah, if that's the case, all this is, is a glorified framework to plug your own in.. Not exactly what was promised.

tpmx · on March 31, 2023

Did you also skim the accompanying (or rather, main) repo, https://github.com/twitter/the-algorithm ?

From a quick clone and line-count, it has:

  235 kLOC .scala
  136 kLOC .java
  22  kLOC .py
  7   kLOC .rs

So I don't think you did, since you posted so quickly and that's a LOT of code.

I also haven't skimmed this code except very superficially, but perhaps you should since you're out there making statements with your Princeton credentials.

(I posted this comment with the heads-up a few minutes after your comment above and then expanded it as you didn't respond.)

Lord_Zero · on April 7, 2023

I think you misunderstood. He's saying the training models are not there.

kadavy · on March 31, 2023

For example, MostRecentCombinedUserSnapshotSource seems to be influential (such as for calculating "tweepcred"), but we can't see how it's calculated.

eecc · on March 31, 2023

Wouldn’t that make them easy prey of “spam SEO”. However, given the framework isn’t it still possible to guess the models?

makeitdouble · on March 31, 2023

The spam SEO issue should be dealt/thought about _before_ engaging in the whole adventure, and having to guess how it could work if decently implemented properly defeats the "open source" spirit of it.

More credits would be given if the very idea of open sourcing the algorithm hasn't already been discussed to death with predictions of the difficult points and how it probably won't happen in any sane way.

eecc · on April 2, 2023

And them be pilloried for not doing it or not fast enough. Damn if you do, damn if you don’t.

I’m starting to think the broblem with Elon is mostly personal, he’s just a proxy and default wrong.

(not that I approve of his behaviors, but I can’t enjoy this whole mobbing that he’s getting; not that he cares this I’m not worried he’s getting traumatized in any way? it’s just how it’s become an identitarian trait for a certain group that irks me.)

jimkleiber · on April 2, 2023

Makes me wonder if a way to override people SEO hacking the algorithm is to create a market of open-source algorithms that each individual can choose and then it's not trying to hack THE algorithm but having to hack many and not knowing which algorithm an individual is using.

muddi900 · on April 2, 2023

You don't have to target each 'algorithm' all at once. You can target them one at a time. Hell you can run A/B test single out the easiest targets.

jimkleiber · on April 2, 2023

Yes but right now there is 100% of the users using the one algorithm (or chronological). If one doesn't know what percentage or which people are using which algorithm, it becomes harder to know which ones to try to hack to have the biggest result.

modeless · on March 31, 2023

What about these? https://huggingface.co/Twitter

simonw · on March 31, 2023

Those look older to me. They all have last updated dates for October and November 2022.

EastSmith · on March 31, 2023

FB open source algo looks much better, right? /s

zhte415 · on March 31, 2023

Is it valid to focus tracking a Dem/Rep split when that split is an exclusionary design for many Americans? Or is it not exclusionary in your belief? I'm curious of a social science perspective.

Ignoring the global nature of Twitter for a moment.

meghan_rain · on March 31, 2023

So why did they opensource it?

daveguy · on March 31, 2023

So they could pretend to be open. It's the "Open"AI model. Open-washing?

cubefox · on March 31, 2023

This is a very cynical take. They should be commended for publishing recommendation code at all, which no other major social network does.

SequoiaHope · on March 31, 2023

Well if they say “we will open source the algorithm” and then what they really open source is a little bit of slightly relevant code that doesn’t allow us to understand the algorithm, then what we can deduce is that they are trying to weasel out of public commitments.

I can’t say for sure if that happened, but if they made a clear promise and then did something else, it’s perfectly reasonable to call that out.

OJFord · on April 1, 2023

Devil's advocate though: imagine you were to open source (probably with quite a short deadline) some 'algorithm' used in whatever you work on, but the rest should stay private; how would you go about that?

I don't think it's easy, there's inherently some interface(s!) where it's a hand-wavey 'get the thing from the private bit', and defining that sensibly is hard, and if you try to do it well will probably lead to a lot of meetings, scope creep, etc. - and as far as that goes it's not easy anyway, since it's highly technical and implementation-specific yet also a management/policy decision to make.

anyonecancode · on April 1, 2023

It depends on what your goal in open sourcing is. Are you looking to provide a base for others to build software on, and to provide a way for others to contribute back to your code? Then publishing the code makes sense.

Are you looking to build public trust in you and your organization? Then dumping a bunch of code with no context isn't going to help much, as it's not code but behavior that builds or destroys trust.

Are you looking to lean into a polarized partisan environment, pushing a narrative where its you and your supporters against an unfair group of "others"? Then a big splashy move high on symbolism and low on substance that will inspire lots of high profile, divisive media coverage is a great way to go.

jjeaff · on April 1, 2023

If you were doing it in good faith, you wouldn't need to publish the actual code. Most likely you should publish an article and a flowchart explaining how the algorithm works. Publishing a partial chunk of code just creates a story that supporters who don't understand can parrot that "they opened their algorithm".

oneeyedpigeon · on April 1, 2023

Exactly. Publishing what they have is the worst of both worlds - hopefully people will create flowcharts based off it, though, although it sounds like there will still be a low level of accuracy.

5e92cb50239222b · on March 31, 2023

I still hear reverse-FUD about nvidia supposedly fully open-sourcing their Linux driver, when in reality they opened a tiny kernel portion of it that allows the main proprietary blob to connect to necessary kernel interfaces. You have to call out this bullshit when you see it.

mananaysiempre · on March 31, 2023

Wait, what? AFAIU what you say is true, except for the part where the “main proprietary blob” does not run on the CPU. This isn’t as glorious as an actual open-source driver would be, but it does have meaningful advantages—e.g. you now have a ghost of a chance of implementing Nvidia GPU support on a non-Linux kernel, by uploading the GPU-side blob and rewriting the CPU-side shim as required. Or is the blob license-restricted from being used line that?

mort96 · on April 1, 2023

The "main proprietary blob" they're talking about is the userspace portion of the driver; the portion which does all of the heavy lifting. That definitely runs on your CPU. The only part they open-sourced is the kernel portion of the driver, which just exists to facilitate communication between the userspace driver and the hardware.

philote · on March 31, 2023

Hey, we can get even more cynical. Why should we trust that this code is even similar to what they run in production currently?

concordDance · on April 1, 2023

I can't imagine deliberately special casing Elon's account in something they made from scratch to fool people.

rakoo · on March 31, 2023

Let's have reasonable goals, shall we ? "Their shit doesn't stink as bad as others'" is nothing commendable, especially after souch publicity.

jrochkind1 · on April 1, 2023

I say "why not both". Even if they are doing it only for good PR, we encourage it by giving them praise, because we should encourage things we want. (While remembering that they are not our friend, they are an entity we should pressure, and the way we pressure is by giving praise when they do things we like, and critcisim when they do not).

LastTrain · on April 1, 2023

I’d give them more credit if they’d been honest and kept it secret then lie to my face and pretend they didn’t?

raiyni · on March 31, 2023

They should be commended for open sourcing something they don't understand because they fired all of the people whom understood it? Elon admitted as much.

misiti3780 · on March 31, 2023

[flagged]

oneeyedpigeon · on April 1, 2023

Because the way he acts gives people every right to. I agree that he may be misrepresented, but if he is, then he has to shoulder at least some of the blame.

mejutoco · on April 1, 2023

The question is: are they right?

guelo · on March 31, 2023

Any time a billionaire buys a media company it's bad for the health of democracy.

Mordisquitos · on March 31, 2023

Not necessarily. What if the media company was bad for the health of democracy, and the billionaire's incompetence destroys the company's social standing and thus its ability to do more damage (even in the billionaire's own interests)?

nonbirithm · on March 31, 2023

Yeah, have to wonder how many people, if they had the money, would want to buy out Twitter just to wipe it out. Doesn't a huge chunk of HN hate Twitter and wish it were dead?

(Regardless I think that would be useless in the long run, since the millions of stranded users will still want another Twitter-like platform. And Twitter imploding without a designated archive will wipe out a tremendous amount of digital history.)

A lot of his decisions look pretty incompetent in the surface, like how could he not see how charging for verification devalue the system to whoever has the money?

Instead it could just be an intentional ploy to completely devalue Twitter disguised as incompetence. He can justify firing employees and charging for API access/verification as money-saving strategies, even if they're terrible strategies that have little chance of succeeding. And he could make enough people believe he's an idiot who makes things up as he goes rather than someone specifically driven or apathetic enough to run Twitter into the ground. Not to mention he was forced to buy them after changing his mind. Almost feels like a "so that's what happens" response.

I wonder how higher powers would be able to distinguish fake incompetence from real incompetence. Would they care how Twitter as a private company ends up if it's the case that it implodes from its own legitimately bad business decisions? It reminds me of how employers won't directly fire employees for discriminatory reasons, instead they make the employees' lives miserable so they're compelled to leave on their own, thus they escape scrutiny.

ClumsyPilot · on March 31, 2023

This is basically at the level of "9/11 was an inside job to bring down WTC 1, but WTC2 was destroyed in an unrelated but simultaneous terrorist attack"

tablespoon · on March 31, 2023

> Yeah, have to wonder how many people, if they had the money, would want to buy out Twitter just to wipe it out. Doesn't a huge chunk of HN hate Twitter and wish it were dead?

> (Regardless I think that would be useless in the long run, since the millions of stranded users will still want another Twitter-like platform.

If there's not an obvious successor, right when its shutdown, a lot of those people might get their habit broken and find something better to do. I know Mastodon was held up as a successor, but it's unclear to me if that's actually capable of scaling to that level.

dmix · on March 31, 2023

Mastodon is way too flawed to be anything but a niche tool for tech people and activists. I highly highly doubt such a system can cross the chasm. That doesn’t mean that’s a bad thing though.

bobsil1 · on April 1, 2023

Or, he’s as incompetent as he looks.

pastacacioepepe · on March 31, 2023

Can you name one relevant media company owned by someone from the working class?

scythe · on March 31, 2023

If you personally own a media company, you are by definition bourgeoisie. But see:

https://en.wikipedia.org/wiki/Media_cooperative#List_of_medi...

paulddraper · on March 31, 2023

Which is why HN was so incensed about Bezos buying the Washington Post.

simondotau · on March 31, 2023

And when a highly scrutinised, highly visible billionaire buys it off a different bunch of billionaires which you know little about?

misiti3780 · on March 31, 2023

i wasnt referring to him buying twitter, i was referring to him saying he was going to open source the recommendation engine and then doing it.

i agree billionaires owning media companies is huge problem

mulmen · on March 31, 2023

Do you believe billionaires can do good? Is their existence an existential threat to democracy?

hooverd · on March 31, 2023

Yes. There are plenty of philanthropic billionaires. Yes. That much money buys a destabilizing amount of influence.

systemvoltage · on March 31, 2023

Billionaires are billionaires not by literally storing cash. The rest of the society values their contributions and creations in the companies/corporations they run. Sure, they have some liquidity but the entire concept of resentment towards billionaires is essentially equal to resentment for the betterment of the world. There are some exceptions but for the most part, in a well oiled market, you can't just become a billionaire by fucking over people. See Adani and how it turns out for him: https://www.ft.com/content/5c0b6174-e66d-4fa5-89a5-6da1d69ab...

v0idzer0 · on April 1, 2023

Every major media company is owned by a billionaire

cubefox · on March 31, 2023

[flagged]

swores · on March 31, 2023

It's because there was close to zero newsworthy information in them, just nonsense being disseminated by wannabe-journalists.

r3trohack3r · on April 1, 2023

I encourage you to watch the C-SPAN recordings of the senate sessions where they brought in Twitter employees and journalists to cover what was in the Twitter files.

From your comment it sounds like you’ve been consuming the 30s soundbytes from those hearings and the misinformation spreading around the internet.

A long list of 3 letter agencies were compiling lists of citizens and journalists and sending them to social media companies to review for ToS violations.

There is a very real threat to civil rights here. When this cannon swings around and points back at LGBTQ, racial equality, stopping the war on drugs, etc. this is going to be “not pretty.”

And the hearings covering them were unbelievably shameful. Senators talk passed the guests in the room. Refused to abandon their “sick burn” scripts regardless of where the conversation went. Insulted their guests. Went in random directions of questioning that had little to do with the root problem…

At the core of this, 3 letter agencies (seemingly across the board) have decided that it’s acceptable to ask social media companies to prevent citizens from communicating on their platforms by selectively directing the attention of their moderation teams towards individuals. Whether this is legal, or a violation of 1st amendment rights, is for sure an open question.

Only one senator directly addressed that and only briefly by saying “maybe they’re trying their best” - a statement that doesn’t exempt anyone involved from following the law.

Is the government allowed to censor citizens by weaponizing their ToS for selective enforcement and, if the government can do that, where is the line drawn? How specific are they required to be? Can a platform ban all political speech and then only selectively enforce requests from the government without doing their own moderation? How far can we launder the 1st amendment through a public-private collaboration of enforcing ToS?

Honestly I’m not sure what the hearings were really meant for, the government is unlikely to hold itself accountable. At this point I do believe the ball is in the citizen’s court to bring suit against the agencies named in the Twitter files like we did with the presidential surveillance program.

jjeaff · on April 1, 2023

The government requesting that the tos of a private company be upheld seems rather mild to me. Did we get the reasons for the requests in the released files? Were they trying to reduce foreign propaganda or public health misinformation or something else important?

variant · on April 1, 2023

You like your government trying to tell a private company what's true and untrue?

daveguy · on April 1, 2023

More than I like a private company telling me what's true and untrue.

v0idzer0 · on April 1, 2023

You clearly are oblivious as to what they contain

misiti3780 · on April 1, 2023

[flagged]

dang · on April 2, 2023

Please don't break the HN guidelines like this. It's not what this site is for, and destroys what it is for.

What's worse, if you have a true point, then posting like this actually discredits the truth and gives people a reason to reject it. That isn't in your interest and in fact hurts everyone.

https://news.ycombinator.com/newsguidelines.html

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...

mort96 · on April 1, 2023

"Way too negatively"? We're talking about one of the world's most influential people who uses their power to randomly accuse innocent normal people of being pedophiles. There is no portrayal too negative.

hanniabu · on March 31, 2023

This is like FB open sourcing the compiled frontend code you can see yourself using inspect.

If we commend them for this we're helping promote and encourage this faux open source virtue signaling

cubefox · on March 31, 2023

No, that's very different.

correlator · on March 31, 2023

There is clearly a lot of information to share. It's worth considering this could be step 1 of n as opposed to assuming the worst possible intention.

yurodivuie · on March 31, 2023

It's healthy to have a normal amount of cynicism. They released it for a reason. "The goal of our open source endeavor is to provide full transparency to you, our users, about how our systems work."

Why be transparent (or try to appear transparent)? To convince people to trust your platform (or to recruit - which seems to be another goal of the post). Why would Twitter want or need to do this now? Well, there is a bit of context. This disclosure doesn't exist in a vacuum.

mirkules · on March 31, 2023

I love this take. Doomed if you do, doomed if you don't.

drstewart · on April 1, 2023

[flagged]

blitzar · on April 1, 2023

I agree, which is why I wonder what your motivation is to defend Twitter. You're posting about this for a reason. If I were a social media company, I'd probably have paid agitators to defend them.

jstummbillig · on March 31, 2023

If we are willing to not assume some borderline "it's what they want you to think" conspiracy play, obviously there was always going to be a lot of highly interested and qualified people taking a very close look at this and, at some point, there was always going to be very definitive conclusion of what's the deal with what they released.

If your play was "it's some source code, hence people will think we are open, and that should be really good for us", that would make you a very special kind of idiot in this space.

joshspankit · on March 31, 2023

That was one of Elon’s core statements when he first talked about buying Twitter. If he had gotten it out sooner there would be an easier link between the two, but if you want more context just go read the old tweets and articles from the Twitter vs Elon days.

kzrdude · on March 31, 2023

If we can't build anything with this, is it "source"?

bilekas · on March 31, 2023

"Does not include batteries"

justapassenger · on March 31, 2023

You must be new to Musk's business practices.

avanti · on March 31, 2023

It's no secret that Twitter, like any other social media platform, is driven by user engagement and ad revenues. The more time we spend on the platform, the more valuable it becomes for them. With this new open-source algorithm, they're essentially crowdsourcing improvements to their system to better serve us the content we crave.

this move could be seen as a strategic PR play to boost their public image amidst the growing concerns around algorithmic bias and lack of transparency. By inviting the community to collaborate and address these issues, they're not only shifting some of the responsibility onto the users but also deflecting potential criticism.

bradly · on March 31, 2023

Because they let go many of the engineers working on it?

carstenhag · on April 1, 2023

Noone has mentioned this before - I don't know if it's really related, but afaik the European Union is thinking about requiring social media platforms to be more transparent when it comes to recommendations etc. If you can already say "hey we have a lot already online!" then maybe the laws will become less strict.

llx2 · on April 1, 2023

bc he have no devs anymore and thinks the community will fix it for free

w0m · on March 31, 2023

PR and it was already leaked last week.

anigbrowl · on March 31, 2023

helsinkiandrew · on April 1, 2023

> But the underlying policies and models are almost entirely missing... Without those, we can't evaluate the behavior and possible effects of "the algorithm

And neither can spammers find and test the cracks and edge cases that would allow them to break the system, that does sound reasonable to me. If they were public there would be an arms race between spammers/those wishing to game the system and Twitter engineers.

ivalm · on April 1, 2023

Then don’t pretend to release “the algorithm.”

helsinkiandrew · on April 1, 2023

They’re explaining how it works without giving the specifics. Much like the US military explains how the nuclear deterrent works without disclosing detailed plans and control codes.

novok · on April 1, 2023

It's an open algorithm, but it's not open data! (joking)

robopsychology · on March 31, 2023

[flagged]

culi · on March 31, 2023

imagine thinking you need to read every file in a project to understand the architecture and which pieces are important for specific functionality you're looking to understand. Have you ever picked up a bugfix ticket for some code you didn't write?

pnt12 · on March 31, 2023

It's fast to read stuff when you have the domain knowledge. The weights won't be a 5kb Scala file: they'd probably be a big binary file, which is easy to search it github/locally after cloning.

Otherwise, if they are provided, someone in the thread will surely point to them.

elorm · on March 31, 2023

You missed this in your rush to display your newly acquired sarcasm101 skills:

  "Skim": To read quickly or cursorily, to glance over, or to omit details in order to get the gist of something.

robopsychology · on March 31, 2023

Context: I studied at Oxford

Fair point, I missed that when I skimmed OPs comment

raggi · on March 31, 2023

class project, 200 students, 1500 LoC each. Time for grading.

there are contexts in which this may be well practiced.

tpmx · on March 31, 2023

We should really all just bow in awe as we are clearly inferior.

robopsychology · on March 31, 2023

Princeton has a Code Reading 101 that all postdocs/professors must take, however in exchange for the Secrets of Speed Reading you must acknowledge every message with where you learnt those skills.

fanagra32 · on March 31, 2023

[flagged]

acdha · on April 1, 2023

The context is relevant for indicating that they’ve familiar with the problem and have thought about these issues in depth. It’s also useful for not being accused of hiding their identity if someone thinks they have an unmentioned agenda. Argument from authority is bad when it’s of the form “I am an expert, therefore you shouldn’t question this claim”, not when it’s used to provide an identity to a previously-unknown name while also providing a cogent argument and supporting evidence.

ngrilly · on March 31, 2023

What did you expect?

SequoiaHope · on March 31, 2023

I don’t know if the parent’s expectations matter here. This is more about making sure others don’t misunderstand the meaning here.

ngrilly · on April 1, 2023

Good point. I didn't see it like that. Thanks!

bobobob420 · on March 31, 2023

Can i audit your classs for free?

jonathanmayer · on Feb 9, 2023

Context: I teach at Princeton and used to work at the FCC.

Several comments suggest systematically comparing FCC data to what ISP websites say about availability. My research group did this! Here's the paper:

https://dl.acm.org/doi/10.1145/3419394.3423652

And here's a followup project by investigative journalists at The Markup:

https://themarkup.org/still-loading/2022/10/19/dollars-to-me...

TomSwirly · on Feb 10, 2023

So in other words, all the telecoms are systematically lying to the FCC in order to steal money from taxpayers, but there is no actual enforcement nor penalties for doing so, so they continue to do this with impunity.

(I remember when Clinton gave away a huge giveaway to the telecoms so they would provide fibre everywhere, and then they did fuck all and just laughed.)

In a just society, they would quickly be presented with a estimated bill for the largest amount that they could possibly have stolen - I'm sorry, let me repeat this word "stolen" - from the US taxpayer, PLUS massive penalties, and then they would be required to prove how much they actually stole if they wanted to reduce the cost.

Plus complete discovery of all their records should be required, with a view toward criminal prosecution of their executives.

As it is, they have absolutely no reason not to cheat, lie and defraud if they think they can make money at it.

tablespoon · on Feb 9, 2023

> And here's a followup project by investigative journalists at The Markup:

> https://themarkup.org/still-loading/2022/10/19/dollars-to-me...

That article is pretty bad. It doesn't once mention DSL or that technology's inherent technical limitations that can result in widely variable speeds (IIRC, your bandwidth is determined by the length of the wire between your house and the central telephone office). Then it spends a lot of time talking about race, which is likely creating a misleading impression that lends itself to outrage.

tguvot · on Feb 9, 2023

DSL bandwidth is determined by the length of the wire between DSL modem and DSLAM which can be in the cabinet on the curb (and I believe it usually is).

Also it depends on what version of DSL standard we are talking. I personally started with adsl2 which was 12/3, upgraded eventually to VDSL2 which did 150/10 and latest standard is G.Fast which can give 1 Gbit/s aggregate uplink and downlink at 100m

kevin_thibedeau · on Feb 10, 2023

It also depends on the condition of the wire. 50 year old paper insulated POTS wiring will struggle for just a few Mbps even if it's a short hop to the CO. Thank god the telcos were able to collect the USF surcharge to pay for the necessary upgrades.

tguvot · on Feb 10, 2023

good point. interesting if there are any estimations "out there" about age of POTS wiring.

Retric · on Feb 9, 2023

The odds of having a cable under 100m to a DSLAM basically rounds to 0.

At an highly optimistic 1 mile you’re already down to 20Mbps and most people are significantly further than that. Remember it’s the physical distance of the cable between the actual devices that matters and that’s not straight.

tguvot · on Feb 10, 2023

depends on location. and on greed of the telco. i used to live within 100m of dslam. in cities with dense construction/etc it's achievable. also, with g.fast it's 1000 aggregated at 100m. for longer distances numbers are 200 m 600mbit, 300m-300mbit, 500m - 100mbit. numbers are not too bad even for suburb. For comparison, high speed mm wave 5g needs to be deployed any 100-200m in order to get proper speed/penetration.

admittedly, even g.gast it's not as good/scalable as DOCSIS or Fiber, but it could be used and deployed "back in a day" as perfectly good solution. and even today it's not that bad for majority of population, if properly deployed

as anecdote , i saw like 20 years ago privately deployed/managed DSL systems in kibbuz. wonder what they have now

Retric · on Feb 10, 2023

Again you can’t directly compare distances between 5G and DSL because the wire isn’t taking the shortest path through 3D space between teleco equipment and your modem.

Also a single 600-700MHz 5G tower can cover hundreds of square miles with 5G service with up to 250 megabits per second. 2.5-3.5GHz can still hit several miles with up to 900 megabits per second, and 24-39GHz towers can cover a mile radius at up to 3Gbps. Real world performance depends on many many factors, but DSL performance can be similarly degraded from it’s theoretical maximum.

tguvot · on Feb 10, 2023

i compared with mm wave 5g because it requires same density of deployment as DSLAMs for proper performance, if not higher.

> Real world performance depends on many many factors,

like how many UEs are sharing spectrum. Which is usually a lot

> but DSL performance can be similarly degraded from it’s theoretical maximum.

totally.

mint2 · on Feb 10, 2023

dsl can get that fast…?! Why is even today the current best offering in many mountain communities in CA is 5/1 ?

and why is 5/1 also the only AT&T offering in pockets of high tech Irvine CA?

tguvot · on Feb 10, 2023

because at&t doesn't feel like upgrading equipment. too much effort. in general they made a strategic decision to invest mainly in wireless. you can also throw into mix words like "absence of government regulation" and "regional monopoly".

FireBeyond · on Feb 10, 2023

On multiple areas, AT&T (I believe) successfully petitioned the FCC to not count a bunch of low end DSL and similar services, on the basis that they were "obsolete" and lowering averages.

To be clear, they were still actively selling those services, and in some cases, it was the only option, but they just didn't want them to count.

philsnow · on Feb 10, 2023

I'd like to read the paper but not $15 want to read it. Apparently ACM wants to charge $5 or $10 for this one article even if I had a membership.

Is this one of those situations I hear about where researchers would be super happy to provide copies of papers that ACM etc keep behind paywalls, if you just ask?

viraptor · on Feb 10, 2023

There's a helpful raven at the hub of science for that ; )

https://sci-hub.ru/https://doi.org/10.1145/3419394.3423652

zxexz · on Feb 10, 2023

They most likely would send you a copy, I have a 100% success rate with that. Though I should also mention that paper is present on scihub at this very moment.

mistrial9 · on Feb 10, 2023

that video presentation is heroic! what a riot to see the numbers sliced that way; and you are clearly cautios in the estimations. well played

runnerup · on Feb 10, 2023

> Here's the paper:

It's super-paywalled. Can you upload a copy of it somewhere?

jonathanmayer · on Jan 10, 2022

I previously served as CTO of the FCC Enforcement Bureau. A couple thoughts on the regulatory dimensions of this report.

* This could be a Federal Trade Commission problem. T-Mobile, like all major ISPs, has made public representations about upholding net neutrality principles [1]. These voluntary commitments were part of the Trump-era FCC's rationale for repealing net neutrality rules. Breaching the commitments could constitute a deceptive business practice under Section 5 of the Federal Trade Commission Act.

* This could also be a Federal Communications Commission problem. When repealing the Obama-era net neutrality rules, the Trump-era FCC left in place a set of transparency requirements [2]. Making an inaccurate statement about network management practices can be actionable under that remaining component of the FCC's net neutrality rules.

I haven't seen a comment from T-Mobile, so to be clear, that's just based on the report.

[1] https://www.t-mobile.com/responsibility/consumer-info/polici...

[2] https://www.ecfr.gov/current/title-47/chapter-I/subchapter-A...

inetknght · on Jan 10, 2022

> Making an inaccurate statement about network management practices can be actionable under that remaining component of the FCC's net neutrality rules.

Who would be responsible for bringing about that action and, if they don't bring about action, what can regular people do about it?

saguntum · on Jan 11, 2022

Thank you. Is there a form where one could file a complaint with the FCC to inform them of this? I'm not sure that this would be widely reported.

I am also curious if the reports about content filtering being required to deactivate the feature are accurate, and if so, what the default status of that feature is on TMobile's network.

jonathanmayer · on Dec 21, 2021

Hi, I previously served as CTO of the FCC's Enforcement Bureau, where I worked on then-Chairman Wheeler's Robocall Strike Force. I'd like to offer a few observations that might be of interest.

* T-Mobile, like the other carriers, is offering a numerator and not a denominator. These call filtering services are plainly valuable, but it's difficult to evaluate how effective they are based on current public evidence.

* It isn't a coincidence that the top robocall destinations include locations that are popular for retirement. These scams disproportionately target and take advantage of older customers.

* Call authentication (STIR/SHAKEN) is helping, and will continue to become more effective. The FCC did not push carriers to rapidly adopt call authentication during the last administration; Congress eventually stepped in with the TRACED Act, and the FCC has since made STIR/SHAKEN a top priority.

ipython · on Dec 21, 2021

From anecdotal evidence (n=1) the call blocking feature on T-Mobile is about 70% effective. Unfortunately I don’t know of an api to pull my full phone and spam shield records but I estimate I received about 2,000 calls over the past three months. About 90% of those were spam/scam calls. Of those, T-Mobile identified and blocked about 70% of them.

It is reassuring to see the stir/shaken “checkmark” on my iPhone call log indicating that the call has been authenticated. Unfortunately as you say it’s not very effective yet.

I’ve noticed that there are carriers/voip gateway providers who are proactive on shutting down spam emanating from their networks and others who are not. Not affiliated but the list here seems to be accurate: https://scammerblaster.com/the-ultimate-method-of-scammer-pa...

lifeisstillgood · on Dec 21, 2021

wait what. You recieved 540 spam calls over a quarter? (2000.9.3)

Holy crap. That's six a day. I would have thrown my phone away.

ipython · on Dec 21, 2021

Correct. I think my high water mark was about 30 calls in one day. I would receive a call while messing with another, so I would sometimes conference them together for hilarity to ensue.

lifeisstillgood · on Dec 22, 2021

Spam as in "want to use our website building agency?" or "want to buy this incredible new coin/NFT/pump dump stock?"

ipython · on Dec 22, 2021

At some point my phone number was sold as part of a list of “old people”. So my calls consisted of a mix of Medicare supplement plans, Medicare scams (“free” diabetic supplies, something about chronic pain, and my favorite which was “a five year renewal” of my Medicare card. All they needed was all my PII, doctors name and Medicare card number!)

I also received a lot of other scam calls targeting older folks: namely callers impersonating Social security administration officials who scare you into sending thousands of $$$ to them so you avoid getting arrested - you’re told that your SS benefits are suspended and you’ll be charged with a crime because your SS# was associated with some vague crime in the “southern border of Texas”…

It’s honestly sickening to see in real time how these low lives fleece innocent people and it makes me furious. I do what I can to try and shut them down but I’m sure it’s just a drop in the bucket and they just pop back up with a different voip provider in a few days anyway.

They can be very persistent and they will track your “identity” for years. I had invented a persona back in 2015 and forgotten about it. Someone called several dozen times - very aggressively - asking for that persona. I had fun messing with him but it was scary having him pull up personal details from over 7 years ago even if it was totally fabricated.

abdabab · on Dec 21, 2021

One time I received 30ish calls everyday for a few days and each one of them was from a different number but with same prefix. It bothered me that I couldn’t block that prefix.

UncleMeat · on Dec 21, 2021

That’s about where I’m at. A slow day is two spam calls. A bad day is 10.

shkkmo · on Dec 21, 2021

It seems ridiculous to me that I regularly receive calls that are clear indicators of illegal activity but that nobody is being held accountable.

Why is there no way to find the people who are making these calls and why are the phone companies not liable for allowing these calls to be made without accountability?

hedora · on Dec 21, 2021

The phone companies standardized on a hopelessly insecure protocol in 1975, and have no financial incentive to fix it.

If the FCC mandated a $1/spam call fine for cell phone providers (automatically paid as an unbounded rebate to subscribers), I suspect they would fix it in under 12 months.

More reading on the protocol (Signaling System 7) is here:

https://en.m.wikipedia.org/wiki/Signalling_System_No._7

The fundamental issue is that is assumes 100% of global telephone exchanges are trustworthy.

sennight · on Dec 21, 2021

> The phone companies standardized on a hopelessly insecure protocol in 1975...

I vaguely remember an interview with somebody involved in early ARPANET standardization efforts stating pretty definitively that the prevailing direction for network protocols was source based routing. Anybody who has ever had to write an email address parser has seen vestiges of this (multiple @, ! and : symbols). Supposedly a representative from the NSA helpfully "suggested" they abandon that line of thinking and just mimic the PSTN's approach of trusting the next hop to do the routing.

I wonder how accidental it is that SS7 was implemented in such a plainly insecure manner.

ipython · on Dec 21, 2021

It’s a lot of work and honestly the telcos don’t care. Even if and when you do find them, what can you do? They’re calling from halfway around the world - so “impersonating a us government employee” is not a law you can enforce on a citizen of another country.

shkkmo · on Dec 21, 2021

Why can't the telcos be held liable for routing these calls? If you get scammed and could sue the phone company, they'd very quickly find real solutions.

AnimalMuppet · on Dec 21, 2021

I suspect it's that pesky "rule of law" thing. We need to change the laws to make them liable.

KoftaBob · on Dec 21, 2021

Most of the scam/spam calls originate from overseas, while using American phone numbers:

"Five U.S. states, Costa Rica, Guatemala, India, Mexico and the Philippines are where most robocalls originate."

I imagine it's much more complicated to prosecute robocallers that live overseas, as you're now dealing with having to extradite people.

imglorp · on Dec 21, 2021

Around 99% of calls I receive, total, are spoofed to my local area and exchange. The discrimination tech is clearly not being used. Sprint/TMO.

msh · on Dec 21, 2021

A quick fix could be that you require that the phone number matches the country the call comes from.

Won't solve everything but maybe a little bit.

codeddesign · on Dec 21, 2021

Then every time I travel overseas I cannot use my phone or a US phone number? What about living close to Canada, Mexico, Caribbean…etc and you pick up international towers?

It’s an easier fix, but not really a solution.

The reality is that everyone wants fairness but no one really wants government regulation (Russia is a great example of this where your phone number is essentially treated like an assault rifle. Registered, monitored, and geo-tracked).

msh · on Dec 25, 2021

But that would be roaming, not a us number originating from a non us phone line.

annoyingnoob · on Dec 21, 2021

Prior to VoIP it was easier to trace the source of a call. With VoIP, the call could come from anywhere. Also, that VoIP service may have been resold several times and the end of that chain might look like a shady foreign entity with fictitious names. You kill one shady reseller and 3 more pop up.

imglorp · on Dec 21, 2021

Can I sue my carrier for breach of contract?

They are providing me a phone but most callers are spoofed and it can't be answered any more in the way a reasonable person would expect a phone to be useful.

jonathanmayer · on Aug 16, 2021

T-Mobile has had recurring data security deficiencies. I know because I served as CTO of the FCC's Enforcement Bureau, before returning to academia.

In 2017, the FCC determined that T-Mobile had violated federal law in a data breach involving customer credit information [1]. There was reportedly no fine because Congress has imposed a strict one-year statute of limitations on FCC enforcement actions.

In 2020, the FCC charged T-Mobile with again violating federal law in failing to protect customer location information [2]. The FCC proposed a $91.6M fine, widely criticized as insufficient at the time [3-4]. I don't believe the FCC has finalized or collected that penalty.

There have been several other incidents, including in 2018 [5], 2019 [6], early 2020 [7], and late 2020 [8].

I hope there has not been a new data breach. But if there has been, this is the latest in a pattern, and the incentives have to change.

[1] https://www.nexttv.com/news/fcc-admonishes-t-mobile-breach-1...

[2] https://www.fcc.gov/document/fcc-proposes-916m-fine-against-...

[3] https://docs.fcc.gov/public/attachments/FCC-20-27A4.pdf

[4] https://docs.fcc.gov/public/attachments/FCC-20-27A5.pdf

[5] https://www.theverge.com/2018/8/24/17776836/tmobile-hack-dat...

[6] https://www.bleepingcomputer.com/news/security/t-mobile-disc...

[7] https://www.bleepingcomputer.com/news/security/t-mobile-data...

[8] https://www.bleepingcomputer.com/news/security/t-mobile-data...

studentrob · on Aug 16, 2021

Thank you for that context. It seems like breaches are happening every month now. What do you think needs to happen to ensure these gigantic companies secure data? I can imagine (a) new legislation enabling bigger, swifter fines or (b) anti-trust action. Do you think we should prioritize one over the other, do both, or something else?

silisili · on Aug 16, 2021

I left TMo in 2018, when their 'forgot password' link sent me my actual password, via email.

denysvitali · on Aug 16, 2021

Related: https://web.archive.org/web/20180429221742/https://twitter.c...

asdfaoeu · on Aug 16, 2021

A relevant quote from there "What if this doesn't happen because our security is amazingly good?"

aloisdg · on Aug 16, 2021

oh boy...

junon · on Aug 16, 2021

I remember this happening in real time. People were losing their minds over it. I really hope that PR rep got fired, they have no business doing anything related to telecommunications.

EveYoung · on Aug 16, 2021

This reads like a parody account.

art-vandelay · on Aug 16, 2021

Absolutely agree that the incentives have to change!

What does the FCC consider to be "reasonable measures to protect the confidentiality of its customers data"? Is there a document somewhere that outlines the best practices they expect you to follow?

I might be able to better convince my employer to prioritize security work if I had something like that to point to.

simfree · on Aug 16, 2021

So the only fines that T-Mobile has paid are for the rural call call completion issues then?

Crazy that they can get away with regional and nationwide voice outages, SSNs and TINs repeatedly being leaked en masse, and the only fines they get are for rural call completion...

https://www.fcc.gov/document/settlement-t-mobile-rural-call-...

jonathanmayer · on Aug 8, 2021

(Context: I teach computer security at Princeton and have a paper at this week's Usenix Security Symposium describing and analyzing a protocol that is similar to Apple's: https://www.usenix.org/conference/usenixsecurity21/presentat....)

The proposed attack on Apple's protocol doesn't work. The user's device adds randomness when generating an outer encryption key for the voucher. Even if an adversary obtains both the hash set and the blinding key, they're just in the same position as Apple—only able to decrypt if there's a hash match. The paper could do a better job explaining how the ECC blinding scheme works.

jobigoud · on Aug 8, 2021

> only able to decrypt if there's a hash match

This is one of the concerns in the OP, have an AI generate millions of variations of a certain kind of images and check the hashes. In this case it boils down to how common false positives neural hashes are.

NTroy · on Aug 8, 2021

Yes, this ^^^^^^

> The proposed attack on Apple's protocol doesn't work.

With all due respect, I think you may have misunderstood the proposed attack @jonathanmayer, as what @jobigoud said is correct.

amelius · on Aug 8, 2021

There may be another attack.

Given some CP image, an attacker could perhaps morph it into an innocent looking image while maintaining the hash. Then spread this image on the web, and incriminate everybody.

GistNoesis · on Aug 8, 2021

Yes perceptual hashes are not cryptographically secure so you can probably generate collisions easily, (i.e. a natural looking image which has a attacker-specified hash).

Here is a proof of concept I just created on how to proceed : https://news.ycombinator.com/item?id=28105849

AwaAwa · on Aug 8, 2021

Sounds like a fantastic way for law enforcement to get into your phone with probably cause. Random message you a benign picture from some rando account with a matching hash. Immediate capture for CP, data mine the phone, insert rootkit, 'so sorry about the time and money you lost - toodles'.

mbrumlow · on Aug 8, 2021

Don’t warrants have to name why ?

Like a warrant for CP can’t be used to collect evidence on another cases for say tax fraud.

HWR_14 · on Aug 8, 2021

Warrants do have to name why, and where. However, anything they find along the way is fair game. If they open your trunk to find drugs and see a dead body, then the dead body is still admissible. (Assuming that the opening the trunk for drugs is okay.)

cube00 · on Aug 8, 2021

It'd be interesting to see how the way common images are reused (for example in memes by only adding text) would be enough to change that hash. If it wasn't enough it could spread very quickly.

Of course I'd dare not research or tinker with it lest I'll be added to a list somewhere such is the chilling effect.

I guess in that case they'd delete that single hash from the database because they'd still have an endless (sadly) supply of other bad image hashes to use instead.

macintux · on Aug 8, 2021

> Then spread this image on the web, and incriminate everybody.

You'd still have to generate several images and persuade people to download multiple of them into their photo roll. And as I understand it there's yet another layer of Apple employees to review the photo metadata before it ever makes its way to law enforcement.

RHSeeger · on Aug 8, 2021

That does seem like an interesting protest vector, though. Generate a bunch of images that match CSAM images but are mundane. Then have everyone download them and send them to their cloud. Someone then needs to spend resources determining that the images are _not_ actual matches. Basically, a DDOS attack on the functionality.

macintux · on Aug 8, 2021

Indeed, that thought occurred to me as well.

It's a risky bet, though: if somehow that intermediate layer fails and you find yourself locked up and accused of storing/disseminating CSAM material, it's not like the civil rights era when your friends and neighbors (and hopefully employers) will understand you've been arrested for a peaceful protest.

esyir · on Aug 9, 2021

The smarter, if potentially less ethical solution is to encode such images and make memes with them. One of them going viral is likely to flag an enormous number of people along the way.

tomjen3 · on Aug 8, 2021

>several images and persuade people to download multiple of them into their photo roll.

I believe such images are called "Dank Memes" these days.

jonathanmayer · on June 25, 2021

> with the only privacy guarantees being that the data is encrypted during transport, and a "promise" that they will run internal audits to make sure private data isn't released from their servers.

There's much more than that, including: privacy and security review before a study launches, a data minimization requirement, a sandboxed data analysis environment with strict access controls, and IRB oversight for academic studies.

> IMO this seems to provide worse privacy than even Google and Micro$oft's telemetry, which at least use differential privacy to make sure that each individual's privacy is somewhat protected (the data you send is randomised so even if the aggregator is compromised by a malicious third party (e.g. NSA) individuals have some degree of plausible deniability).

The vast majority of Google and Microsoft telemetry does not involve local differential privacy. Google, in fact, has almost entirely removed local differential privacy (RAPPOR) from Chrome telemetry [1].

We've been examining the feasibility of local differential privacy for Rally. The challenge for us—and why local differential privacy has limited deployment—is that the level of noise makes answering most (often all) research questions impossible.

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=101690...

FlyingLawnmower · on June 25, 2021

Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?

E.g. from the FAQ: "We do intend to release aggregated data sets in the public good to foster an open web. When we do this, we will remove your personal information and try to disclose it in a way that minimizes the risk of you being re-identified."

It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.

jonathanmayer · on June 25, 2021

> Have you thought about using central/global differential privacy (which tends to have much less noise) on the "high level aggregates" or "aggregated datasets" that persist after the research study ends?

Yes. Central differential privacy is a very promising direction for datasets that result from studies on Rally.

> It's a little worrying to think that this disclosure process might be done with no formal privacy protection. See the Netflix competition, AOL search dataset, Public Transportation in Victoria, etc. case studies of how non-formal attempts at anonymization can fail users.

I've done a little re-identification research, and my faculty neighbor at Princeton CITP wrote the seminal Netflix paper, so we take this quite seriously.

skybrian · on June 25, 2021

Interesting. I can see that RAPPOR seems to be deprecated in favor of something else called ukm (Url-keyed metrics) but not why this change is being made. Is there somewhere I can read more about it?

jonathanmayer · on June 25, 2021

I am not aware of any public announcement or explanation. Which is... probably intentional, since Google is removing a headline privacy feature from Chrome.

skybrian · on June 25, 2021

How did you learn about it? By studying the code?

jonathanmayer · on June 25, 2021

Our team looked closely at the Google, Microsoft, and Apple local differential privacy implementations when building Rally. It helped that we have friends who worked on RAPPOR.

skybrian · on June 25, 2021

Did you end up using differential privacy in Rally? What's the thinking behind this?

jonathanmayer · on June 25, 2021

> This is a luxury many researchers that work outside of these big tech companies don't have, which creates a scientific power imbalance.

The power imbalance goes far beyond science. Independent research is foundational for platform accountability. An example: when I was working on the Senate staff, before I started teaching at Princeton, a recurring challenge was the lack of rigorous independent research on platform problems. We were mostly compelled to rely on anecdotes, which made oversight and building a factual record for legislation difficult.

9wzYQbTYsAIc · on June 25, 2021

I’m curious as to your take on independent scholarship, outside of the domain of academia?

Would appropriately rigorous independent scholarship be considered as a trustworthy source within your sphere?

jonathanmayer · on June 25, 2021

> Would appropriately rigorous independent scholarship be considered as a trustworthy source within your sphere?

Definitely. Academia doesn't have a monopoly on excellent technology and society research. The Markup's data-driven investigative journalism, for example, is outstanding.

jonathanmayer · on June 25, 2021

> Presumably, the users will be well-endowed and tax-advantaged institutions who could have just bought the information from data-aggregators anyway.

Nope. This is an important point: the type of crowdsourced science that Rally enables is something that researchers couldn't do before. (With the exception of a very small number of teams who made massive investments in building single-purpose crowdsourcing infrastructure from the ground up.)

ysavir · on June 25, 2021

Could you provide more detail on what makes it novel?

jonathanmayer · on June 25, 2021

Common research methods have significant limitations. Web crawls, for instance, usually don't realistically simulate user activity and experiences. Lab studies often involve simplified systems that don't generalize to the real world. Surveys yield self-reported data, which can be very unreliable.

Rally studies, by contrast, reflect real-world user activity and experiences. In science jargon, Rally enables field studies and intervention experiments with excellent ecological validity.

ysavir · on June 25, 2021

Thanks for clarifying! Makes sense.

A few follow up questions:

1. Do you expect the opt-in nature of these studies to impact their findings?

2. To compensate for the voluntary nature of the studies, do you think researchers in general will still be incentivized to find data sources that are less respectful of people's privacy and don't require an opt-in to the study?

jonathanmayer · on June 25, 2021

> 1. Do you expect the opt-in nature of these studies to impact their findings?

The Rally participant population is not representative of the U.S. population—these are users who run Firefox (other browsers coming soon), choose to join Rally, and choose to join a study. In research jargon, there's significant sampling bias.

For some studies, that's OK, because the research doesn't depend on a representative sample. For other studies, researchers can approximate U.S. population demographics. When a user joins Rally, they can optionally provide demographic information. Researchers can then use the demographics with reweighting, matching, subsampling, and similar methods to approximate a representative population. Those methods already appear throughout social science; whether they're sufficient also depends on the study.

> 2. To compensate for the voluntary nature of the studies, do you think researchers in general will still be incentivized to find data sources that are less respectful of people's privacy and don't require an opt-in to the study?

Rally is designed to provide a new research capability that didn't exist before. I don't expect a substitution effect like that.

ysavir · on June 25, 2021

Got it. Thanks Jonathan!

cycomanic · on June 25, 2021

Regarding 2. that would run afoul of many ethics boards at universities. Generally they require that (informed) consent has been given to take part in the study.

cpeterso · on June 25, 2021

> Rally studies, by contrast, reflect real-world user activity and experiences. In science jargon, Rally enables field studies and intervention experiments with excellent ecological validity.

Rally users are all opt-in. How does that impact the design of a Rally study and the conclusions you can draw from it?

9wzYQbTYsAIc · on June 25, 2021

Academic research in the social sciences is rigorously based on the concept of informed consent (i.e., opt-in), in the first place.

There would be no change in terms of research design and the ability to draw scientific conclusions.

edit: also, see https://news.ycombinator.com/item?id=27633212 for details on research design considerations when conducting social science.

analognoise · on June 25, 2021

Except as noted elsewhere, Mozilla also gets the data to "improve products and services" right?

So it sounds like a nice shiny cloak for...exactly the kind of data collection nobody actually likes.

Yay for extra steps?

yellowfish · on June 25, 2021

Mozilla has been known to be pretty iffy when it comes to 'opt in' ( the mr. robot tie in .. etc )

Nicksil · on June 25, 2021

>Mozilla has been known to be pretty iffy when it comes to 'opt in' ( the mr. robot tie in .. etc )

Did the instance you're referencing state it was opt-in then turn out to not be opt-in?

grumblenum · on June 25, 2021

Princeton can't buy data from aggregators? Wikipedia says they have a $26.6B endowment.

jonathanmayer · on June 25, 2021

Princeton research collaborator here. Glad to answer questions about Rally.

> What "data"? Browsing history? Identity? Something else?

That depends on the Rally study, since research questions differ and studies are required to practice data minimization. Each study is opt in, with both short-form and long-form explanations. Academic studies also involve IRB-approved informed consent. Take a look at our launch study for an example [1].

> Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.

The motivation is enabling crowdsourced scientific research that benefits society. Think Apple Research [2], NYU Ad Observatory [3], or The Markup's Citizen Browser [4]. There are many research questions at the intersection of technology and society where conventional methods like web crawls, surveys, and social media feeds aren't sufficient. That's especially true for platform accountability research; the major platforms have generally refused to facilitate independent research that might identify problems, and platform problems often involve targeting and personalization that other methods can't meaningfully examine.

[1] https://rally.mozilla.org/current-studies/political-and-covi... [2] https://www.apple.com/ios/research-app/ [3] https://adobservatory.org/ [4] https://themarkup.org/citizen-browser

thombles · on June 25, 2021

These "This Study Will Collect" and "How We Protect You" sections are really good. It probably wouldn't convince me personally to sign up, but it's as comprehensive as I would expect. It's a shame that these comments didn't make it into the blog post.

mushufasa · on June 25, 2021

I think that the motivation of 'enabling citizen science' is not a very strong one. You will get very, very skewed results, moreso than typical WEIRD, if you conduct studies on the people for whom that is sufficient motivation.

A stronger motivation would be providing a product or service that tangibly adds value to someone's life.

After reading this, I have no idea how Rally would provide any tangible benefits to me.

rglullis · on June 26, 2021

Exactly. It is so weird to see all this marketing speak that makes it sound like users can get to benefit from something, but in the end this is just something that gets people to work and provide data for free to multi-billionaire universities.

We don't any more studies or research to know that the best privacy policy is to not collect any data in the first place.

lucideer · on June 25, 2021

I know you mean well but I think you completely missed the above commenters point.

You've replied here with answers to address their (our?) potential concerns, but the commenter never said they had concerns about the project itself, rather that this particular blog post doesn't "sell" or explain the value add well. That's feedback on the project's communication strategy, not on what it's actually doing.

> > Why? What's in it for them? Since when was giving our data to third parties a good idea? There is literally no motivation presented here.

> The motivation is enabling crowdsourced scientific research that benefits society.

You seem to be confusing "theys". The question is what motivates participants, not what motivates researchers.

9wzYQbTYsAIc · on June 25, 2021

> You seem to be confusing "theys". The question is what motivates participants, not what motivates researchers.

Contrarily, you seem to be confusing “theys”, yourself.

There exist participants that are motivated by participating in research that benefits society.

Just like there exist individuals motivated by lending their computing resources to the various @Home research efforts.

thayne · on June 25, 2021

But if the participants are limited to people who are motivated solely by participating in research, wouldn't that add significant bias to that research?

9wzYQbTYsAIc · on June 26, 2021

Indeed, sampling bias is a large concern.

Nonetheless, much of psychology research conducted in the US has made do with ridiculous sampling bias - the US college student is anecdotally considered to be the most-studied population in the world.

yissp · on June 26, 2021

Doesn't the field of psychology have pretty serious issues with the replicability of their experimental results?

9wzYQbTYsAIc · on June 26, 2021

Indeed, anecdotally, if not empirically, that is the case. Nonetheless, psychology is a highly operationalized field.

In other words, every thorough study begins with an assessment and revision of the consensus language being used to describe reality.

On that front alone, psychology is one of the most hard sciences around.

Deep learning is directly attributable to psychology research, for what it is worth.

chobytes · on June 25, 2021

Personally I don't think that researchers have any more business doing this kind of surveillance than Google and company do.

The idea that this will benefit society seems naive to me. I feel like it will only serve to legitimize the practice by putting ostensibly trustworthy faces on the packaging.

splistud · on June 25, 2021

Not just surveillance, but conducting research within corporate platforms. Therefore, they would have access to my data and a corporation's engine. If I think that google knows too much about me, do I get to opt-in whether that hyper-knowledge is shared to researchers (because I won't).