Hacker Newsnew | past | comments | ask | show | jobs | submit | SamReidHughes's favoriteslogin

The first contract program I wrote was when I was 17 years old. I was working at a service station changing oil. Some guy comes through who wants to stand around while I do the work (it happens) so we struck up a conversation.

Turns out he was a bookkeeper and had just purchased an Apple IIe and wanted to use it for his clients. I knew nothing about accounting, he knew nothing about computers, so it seemed like a good match :)

Four weeks of spending free afternoons at his shop, and it was ready to go. He was happy and I had 200 bucks in my pocket. Life was good.

Almost 20 years later, I get a call from him. He says the program isn't working so well and he wants to upgrade. I'm like WTF? Does anybody in the universe still even have a working Apple II anymore? Why would he keep using something like that for 20 years?

He told me that as computers modernized, it became a bit of a status symbol to have an older-looking system spewing out reams of reports. His customers, who were mostly small construction companies and such, got the feeling of stability and security from something that was unchanged.

It is a very strange feeling to get a call about code you wrote a long, long time ago. If I would have had any sense, I would have realized from the experience that programming is normally an extremely tiny part of actually making a business work. But it took me many more years to figure that one out.


When I played Marco Polo as a kid, I would cheat. When I was “it”, I’d yell out “Marco!” like you’re supposed to, but I also opened my eyes underwater a little bit to see where the other kids were.

One day a friend brought over special goggles. They were covered in black vinyl to block your vision completely. That day we all found out that every one of us cheated the same way. When playing with the new goggles, suddenly it was: “this game sucks” and: “let’s do something else”.

It was eye-opening (sorry).

Cheating on emissions tests strikes me the same way. “All the other car companies seem to be meeting their targets, how can we keep up?”

Well, it turns out that to be as good as the other players, you have to cheat just like they do.



I do a significant amount of scraping for hobby projects, albeit mostly open websites. As a result, I've gotten pretty good a circumventing rate-limiting and most other controls.

I suspect I'm one of those bad people your parents tell you to avoid - by that I mean I completely ignore robots.txt.

At this point, my architecture has settled on a distributed RPC system with a rotating swarm of clients. I use RabbitMQ for message passing middleware, SaltStack for automated VM provisioning, and python everywhere for everything else. Using some randomization, and a list of the top n user agents, I can randomly generate about ~800K unique but valid-looking UAs. Selenium+PhantomJS gets you through non-capcha cloudflare. Backing storage is Postgres.

Database triggers do row versioning, and I wind up with what is basically a mini internet-archive of my own, with periodic snapshots of a site over time. Additionally, I have a readability-like processing layer that re-writes the page content in hopes of making the resulting layout actually pleasant to read on, with pluggable rulesets that determine page element decomposition.

At this point, I have a system that is, as far as I can tell, definitionally a botnet. The only things is I actually pay for the hosts.

---

Scaling something like this up to high volume is really an interesting challenge. My hosts are physically distributed, and just maintaining the RabbitMQ socket links is hard. I've actually had to do some hacking on the RabbitMQ library to let it handle the various ways I've seen a socket get wedged, and I still have some reliability issues in the SaltStack-DigitalOcean interface where VM creation gets stuck in a infinite loop, leading to me bleeding all my hosts. I also had to implement my own message fragmentation on top of RabbitMQ, because literally no AMQP library I found could reliably handle large (>100K) messages without eventually wedging.

There are other fun problems too, like the fact that I have a postgres database that's ~700 GB in size, which means you have to spend time considering your DB design and doing query optimization too. I apparently have big data problems in my bedroom (My home servers are in my bedroom closet).

---

It's all on github, FWIW:

Manager: https://github.com/fake-name/ReadableWebProxy

Agent and salt scheduler: https://github.com/fake-name/AutoTriever


It's absolutely possible to run an email server in 2016 and I encourage anyone capable to do so!

Email is one of the bastions of the decentralised Internet and we should hang on to it.

Every day more and more people are moving to Gmail/Hotmail/Outlook and while I do understand the reasons, it also puts more and more power into the hands of these providers and the little guy (us) gets more screwed (like marked as junk by default by them :< )

Having said that, here's my check list for successfully delivering email:

- make sure your IP (IPv6) is clean and not listed in any RBL, use e.g. http://multirbl.valli.org/ to check

- make sure you have a correct reverse dns (ptr) entry for said IP and that ptr/hostname's A record is also valid

- make sure your MTA does not append to the message headers your client's IP (ie x-originating-ip), messages can be blocked based only on "dodgy" x-originating-ip (see eg https://major.io/2013/04/14/remove-sensitive-information-fro... )

- set up SSL properly in your MTA, there are so many providers giving away free certs nowadays

- SPF, DKIM, DMARC - set them up, properly, this site can come in handy for checking yourself https://www.mail-tester.com/

- do not share the IP of your email server with a web server running any sort of scripting engine - if it gets exploited in any way usually sending spam is what the abusers will do

- last but not least - and while I loved qmail and vpopmail - use Postfix or Exim, they are both more fit for 2016, more configurable and with much, much larger user bases and as such bigger community and documentation.

HTH


There are so many to pick, but here are a few.

Music - Lady Gaga - I was not a fan until I heard her performances and story - Jewel - Gwen Stefani

Comedians - Louis CK - Doug Stanhope - Eric Andre - Hannibal Burress - Chris Rock

Celebrities - Quentin Tarantino - Arnold Schwarzenegger - Danny Trejo (Machete) - Mike Tyson - JJ Abrams - Jerry Seinfeld - John Goodman


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: