Really? It's bad enough when common words get trademarked, but they didn't even call themselves "Southwest Monkey". They called it "SWMonkey".
Without extra context (knowing what the website is for by reading the description), I'd have no clue that "swmonkey" has anything to do with airlines.
Anyhow, that's a separate issue. Southwest was clearly less concerned about trademark infringement than they were with the business model. The contents of the email clearly indicate that they want to disallow scraping with their ToS, and want to enforce that.
Oopsie then, "Southwest Monkey" does look like a trademark issue on top of the scraping issue. Certainly doesn't help.
That said, the guy with the skiplagged website (same idea: use publicly available flight pricing to produce tickets that the airlines see as undesirable) prevailed against the pressure from the airlines, so the situation doesn't look too bleak.
I know of a company that scrapes craigslist in bulk. The owner insisted that he's had no problems. Importantly, his company is not based in the US. Maybe he just ignores their c&d letters.
"Everyone has a right to access public data. And we believe that whether you are accessing that data by typing in a URL in a web browser, through a CURL request, an RSS feed, a cached copy, or having someone read it to you aloud, does not change your right to access public data. "
I am not a lawyer and this is not legal advice, but I believe this is, sadly, dead wrong. The infamous Computer Fraud and Abuse Act contains a provision barring not just unauthorized access to computer systems but also accessing such systems in a manner that exceeds authorized access. In other words, if you break terms of service on a website, you may be in violation of the CFAA.
The law is often wrong and written by the more fortunate in society.
The data is publicly available. The reasons this should not be an issue are self evident.
Google scrapes trillions of sites every second of every day. Where's the outrage in that sir? Or the legalities. Oh right the law doesn't apply to them. Just small indie devs.
I don't hide behind legal speak and lawyers. I stand behind the truth of the matter. I'd say any legal argument against non-malicious scraping is dead wrong on moral and ethical grounds. Lawyers and powerful corporations will always try to stamp out the little guy to protect their precious trademark or data because their intellects are too dull and mediocre to compete with new entrants or innovations, so they call and cry about it to their lawyer instead. It's easier.
Ya, I think the site should be perfectly legal (at least in terms of the scraping).
For what it's worth, I'm pretty sure if you demand Google to stop indexing your site, it will comply. With robots.txt you can even ask them in an automated fashion.
"Legal speak and lawyers" are how we hold our society together in a relatively peaceful and just fashion. Yes, we end up with bad laws, like CFAA, and some days I think the U.S. will just collapse in on itself. But it beats all the alternatives that have been seriously tried. Please remember, lawyers not only try and enforce the CFAA, but are the ones challenging it as well!
They're providing you a service by coding it up, hosting it, dealing with customer service requests and Southwest's lawyers, etc. Seems fair to charge $3 to provide the service of saving you money. Also seems fair to try outcompeting them by providing that service for free at your own expense.
So, the Southwest TOS state that you can't scrape or programmatically access their data. Isn't that the end of the argument unless they find a way to manually access it for customers?
I would think so. That's why you don't see SW in Google flights or any other program. SW wants you to only get it from their website. I'd imagine if there was more to the discussion Google would fight it.
Is visiting a website now some sort of contract? I have no contractual relationship with Southwest, what do I care for their TOS?
In the EU all TOS are basically just a reiteration of standard consumer law, you can put into yours whatever you want but none of that nonsense survives a legal challenge.
Thank you. Well said. I'm not sure why you're getting down voted.
People think that lawyers and terms can control all aspects of life. It's beyond ridiculous that a TOS that most college graduates couldn't understand is legally binding for every word in the contract. I mean shit you can put just about anything in your terms and who knows there might be a lawyer good enough and a judge stupid enough to enforce it.
It's only ok if you're a big corporation like google. Indie devs have to go play in our little sandbox and be careful not to piss of the big boys.
"I am not a lawyer and this is not legal advice, but I believe this is, sadly, dead wrong. The infamous Computer Fraud and Abuse Act contains a provision barring not just unauthorized access to computer systems but also accessing such systems in a manner that exceeds authorized access. In other words, if you break terms of service on a website, you may be in violation of the CFAA."
Given they're charging for this service, maybe there's a way to make it work economically by having low cost workers (e.g. Mechanical Turk or even a bank of workers in certain countries) perform the checks manually (you could even create a browser extension that automates the form filling, but leaves a human to push the button and parse the results). At least by offering it as a paid service, they have some sums they can run to see if manual would work.
That's a neat idea. I imagine a service worker could also be used to do checks in the background, and unless SW wants to ban every IP that hits their site.. :-D
You could also after a time keep stats to determine when the prices are most likely to go down, and then only get the user's attention at those intervals.
My brother-in-law sunk a million dollars of his own money into a similar real-time airline pricing service, all of which was thrown away when his now defunct company got banned for scraping.
There are only a few blessed companies that are allowed to scrape airline data (not surprisingly, big players in the market). If you haven't been granted permission and don't comply with their cease and desist you'll be sued and/or have your scraper IPs blacklisted.
"The problem I see with their argument is that they are making this information public."
I love this idea! However, I think that unfortunately this argument doesn't hold up, and wouldn't in court. The information is available via their site, but that does not make it public.
If the info is on the web it's available to be consumed by humans or robots in a polite manner.
Data wants to be free.
Them trying to enforce their terms is another matter. Maybe it would hold up in court. Stupider things have. I mean they should start telling us how to breathe air next. Because ya know if it's in your terms then it must be legally binding.
> The information is available via their site, but that does not make it public.
Could one offer fair information if there's no reference to Southwest? Is having a website full of listings like "flightno: 123, flight_datetime: 2018-03-04-1045, price: 234.56, price_datetime: 2017-11-15-1130" something Southwest could successfully block?
If you got the info by scraping Southwest's site, I don't think it will matter if you present it without reference to Southwest. You're in trouble for how you got it, not how you present it.
This is mis-reported. This isn't a general ruling on whether scraping sites in violation of the explicit instructions of the site owners is allowed.
The judge granted an injunction prior to the trial proper, permitting HiQ to continue scraping LinkedIn in the lead up to the trial. He did this because HiQ credibly argued that if it couldn't scrape LinkedIn it would go out of business before the court had even determined whether what it was doing was legal. These sorts of injunctions are a procedural matter and quite normal.
it was a public domain image, licensed under creative commons.
Unless it was CC0 (which is not what was linked), Creative Commons licenses are not public domain licenses. Someone still holds the copyright and you have to abide by the license terms. Secondly, CC and "public domain" are about copyright, not trademarks. It's totally possible to infringe someone's trademark even with a public domain image.
Good for you. I completely agree with your sentiment re: scraping. It's ok if Google or another big company does it but god help you if you do as an indie dev. Thank you for sharing this. I commend you.
I've scraped millions of records from all kinds of companies big and small,
politely of course, and I will definitely continue to do so at my discretion for ideas. In your case I would've scraped southwest without hesitation.
I'd make sure to distance yourself from the trademark as much as possible. Maybe even remove "sw" from the domain name but otherwise I don't see how they have a case.
TOS trying to enforce anti-scraping measures is a joke and makes a mockery of the judicial system.
"And we believe that whether you are accessing that data by typing in a URL in a web browser, through a CURL request, an RSS feed, a cached copy, or having someone read it to you aloud, does not change your right to access public data."
I absolutely do have the right to access the data. However, it's not clear why you should have the right to access the data and then monetize redistributing it to me.
The landing page (https://www.swmonkey.com/) still has the workd "Southwest" used several times, and the picture of what is obviously a Southwest plane. I wish common sense prevails and this (obviously very useful) service doesn't die because the founder can't control his ego.
Founder here: There is one case where I use the name SouthwestMonkey instead of swmonkey. I realize that that was a bad naming decision. All the other references to southwest are to the Airline and not this service.
That would be one way to comply with the letter of the TOS. There will be some issue of getting enough traction from users that obey the TOS to make the site useful, though.
I think crowdsourcing would open up SWmonkey to 'attack' from SW, where SW would flood the channel with conflicting info at a rate that would make the SWmonkey site useless to legitimate customers.
Also, most of the companies that offer fare comparisons do not show SW flights... Some of those companies have the tech chops to implement 'fixes' that would comply with the SW site TOS and the massive legal departments to defend their position/approach in any and all courts in the world.
That's got a "boil the ocean" problem. It's probably easier to imagine it on a per-flight basis; how many people in a set of people who may look at a given flight are going to be swmonkey users? You have to get that number above a certain threshold before that would work, but it's hard to get that number above a certain threshold because nobody will want to use the service until you're there.
Plus, with the way airline pricing works now, just because Alice sees one price doesn't mean Bob will see it ten minutes later. The faster the prices change, the higher the percentage of people you need to have crowdsourcing.
And I've just assumed that "crowdsourcing" looks like "install a browser extension". If it involves "typing numbers into another site" you can expect a participation rate indistinguishable from 0%. I've also assumed Southwest remains oblivious to this and never takes any actions to counter what they consider an undesirable use, which is also unrealistic.
Or maybe they could provably crowdsource some of the information and say that the fraction that is bot / crowdsourced is proprietary and irrelevant. Then send a copy of the HiQLabs v. LinkedIn case. https://www.bloomberg.com/news/features/2017-11-15/the-bruta...
Does putting a human in the loop make scraping okay? Can there be a button on the web page : "please click here to run a price check for one of your fellow travelers" that initiates a web page being fetched, attached to a round-robin queue of fares that are being monitored?
I don't think so. All they would have to do is change their terms in conditions to prohibit round robin schemes and then send me a new cease and desist...
Lost faith in the crowd here after reading these comments, against web scraping. Isn't this HACKERnews!? This is the same witch hunt mentality that ultimately lead to the prosecution of Aaron Swartz and look how that turned out.
The anti-scrapers: You should all be ashamed of yourselves. Where's that spirit of innovation and naughty-ness that PG always talks about in his posts?
Realistically they would sue you in your country. Worst-case, they have you prosecuted for violating the Computer Fraud and Abuse Act and demand extradition (most countries have extradition treaties with the US)
If you distribute the scraping load to a browser plugin you could probably avoid the tos violation. And aren't there apis that would give you this data, I love counter gaming tech like this. I hope you rebrand and keep going.
You have little argument there, and you should change that because to a 3rd party it could look like a service offered by Southwest itself.
The part about scraping, etc, is more complicated and you may want to fight that.
See: http://blog.icreon.us/advise/web-scraping-legality and https://arstechnica.com/tech-policy/2017/08/court-rejects-li... as two quick examples I found on previous cases that have gone to court.