Hacker Newsnew | past | comments | ask | show | jobs | submit | lil_tee's commentslogin


Very nice! Thanks for making this.

I am curious, how difficult was it to parse XML? I want to extract footnotes from 10-Q and 10-Ks but I was a little bit intimidated by XBRL.


Thanks. The 13F XML is not too complicated, here is the relevant section of code in case helpful: https://github.com/toddwschneider/sec-13f-filings/blob/main/...

I haven't worked with XBRL though, so am not sure if it's more involved


Thanks


FWIW, nytimes.com accounts for ~4% of all posts that appear on the HN homepage: https://toddwschneider.com/dashboards/hacker-news-trends/?q=...


That's quite a lot! One of every 25, so probably at least one on the front page at any time? Are there domains that account for considerably more than 4%? Would be interesting to see a top ten list.


nytimes.com is #3 domain of late, really #2 behind only github.com if you exclude ycombinator.com (all of the Shows, Asks, etc.)

Top 10 domains by # of items on front page since 1/1/18:

   rank |     domain      | count 
  ------+-----------------+-------
      1 | github.com      |  2041
      2 | ycombinator.com |  1911
      3 | nytimes.com     |  1818
      4 | bloomberg.com   |  1028
      5 | medium.com      |   826
      6 | techcrunch.com  |   735
      7 | theguardian.com |   666
      8 | github.io       |   615
      9 | bbc.com         |   558
     10 | arstechnica.com |   493


While that is true, NYT posts are more highly upvoted than most other frontpage items


Thanks, agreed this is a good idea and I added query param support here: https://github.com/toddwschneider/stocks/pull/4


TLC just released 6 additional months of trip-level data (Jul-Dec 2015): http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtm...

GitHub repo updated to process additional data: https://github.com/toddwschneider/nyc-taxi-data


Number of bikes in active use does not seem to change with the seasons: https://github.com/toddwschneider/nyc-citibike-data/blob/mas...

The summer 2015 increase corresponds to the permanent expansion in August 2015; I'm not sure why there was a dip in Q1 2015


The chart says "Unique Bikes Used Per Day". Q1 2015 was a very cold winter, and I can imagine the system being so under-utilized that there were many bikes that sat unused through an entire day.


Sure, there are undoubtedly lots of examples of businesses that opened in desolate areas and created new taxi activity where there had been none previously, I just happened to focus on Williamsburg for the post.

Another idea that I didn't get around to doing was to look at concert venues and measure taxi traffic around particular concerts to see if it would correlate to bands' overall popularity


Yep, all on a 2012 MacBook Air. Data size was over 400GB with indexes

Simple queries on indexed columns of the trips table take a minute or two, more complicated queries that require a full sequence scan can take up to a few hours


All on internal storage? What was the max internal storage then, 512GB? Sounds like that's pretty tight, barely possible if you aren't doing anything else that takes up much space.

I wonder if it would be easier to stick the Postgres on a server, maybe AWS or local, and just do the queries from the laptop. Or maybe on a Tmux on the server, so you can let a long query run without having to keep the laptop up.


Yes, the database is all on the machine's local 512 GB hard drive. I did store the downloaded flat text files to an external drive and loaded them into the db from there.


My dataset does not include livery cabs

FiveThirtyEight has some additional for-hire vehicle data in their GitHub which they obtained via FOIL request: https://github.com/fivethirtyeight/uber-tlc-foil-response/tr...

Intuitively, I would think livery cabs have lost significant market share to Uber, but I don't actually know


I did, though I asked permission from the Commissioner first


Mostly to keep the app simple and still deployable on the free Heroku plan. Sure, it could handle multiple locations, but then it would quickly outgrow the free tier


Next up: a one-tap mobile app that deploys an instances of this based on current or specified coordinates.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: