Hacker Newsnew | past | comments | ask | show | jobs | submit | dd367's commentslogin

Kid Pix seems like not an astute name.


This software was named decades and decades ago - long before all of that was a mainstream topic.


Long before social media people looking for outrage in simple things.


Sorry, what mainstream topic would make kid pix not astute?


Pedophiles hiding in plain sight. Some incredibly powerful.


Takes a book in English, finds the words that are "rare" (by the measure IDF >= 15 or -ln(frequency) > 14) and lists the definition in a dictionary and the places it occurs in the original text. The final output is a static HTML file. You can view a live demo for a subset of words from a book here: https://deedy.github.io/vocabuliwala/

Some examples output demonstrating some of the key features of Vocabuliwala: - Dictionary lookup - Secondary dictionary lookup: looking up dictionary meanings which themselves need explanation - Difficulty rating - Show appearances in the input book

Who is this for? - A casual reader of a book - A teacher trying to teach a book - Students trying to learn vocabulary for a standardized test

Curious to get the community's feedback and see how I can improve this!


I was a TA for a graduate level class at one of the top universities in the US and I've had some interesting encounters with plagiarism.

I. The time I got caught for "plagiarizing". In an intro systems class, me, a CS major, and my roommate, who wanted to minor in CS, were working together and I was "showing him the ropes". He was an intelligent student and we never worked together on the homeworks aside from general verbal discussions on what the solution could be. He used a Windows laptop and for one of the assignments, his C code wasn't compiling because he was missing some libraries and he told me he couldn't figure it out and we were approaching a deadline and asked me to compile it for him and send him back the binary. I did so, but when sending back the binary, in a rush, I accidentally mistook my HW folder for his (we'd downloaded this as a part of the assignment, and the folder structure was identical) and sent him my binary by mistake. Both of our solutions worked. Obviously, we got "caught" in the most naive way. Our binaries had the same MD5 hash and the CMS flagged us. We were both confused at first, and then we realized what happened and explained it to the professor. The proof was simple - just compile my roommate's binary and run it. However, he annulled our assignment to 0. We still both got As (because you could drop one homework) and while some may claim this was a gentle slap on the wrist, it felt unjust. We clearly made a dumb mistake and we shouldn't be punished at all, especially when we knew how rampant actual plagiarism was.

II. The time I caught students for "plagiarizing". As Kevin points out in his post, there aren't really any incentives to catch students for cheating. As a TA, I get no benefit, and moreover, there's a cost. No one wants to be known as THAT TA who busts kids for using "a little help". Keeping that in mind, I was usually very lenient when it comes to cheating. I've noticed signs, but there was never enough proof to warrant the effort of calling someone out. However, at one level it went too far. Two students who were partners for the "projects" had submitted nearly identical solutions for a complex Graphics homework assignment. They got the answer right, but I looked into their working and they both said "(9/5) / (4/3) == (4/7) / (5*9) = 1/3". I don't remember the exact values, but it was two steps of non-sense numbers and then a correct answer. I ended up reporting the case, mostly because I felt like my intelligence as a TA had been insulted. Are you seriously going to submit random numbers with a correct solution hoping I won't see? In any case, it didn't go anywhere.

III. Discovering a cheating ring. At our university, one of my good friends and project partners told me there was an "enormous Asian cheating racket" - not to call out any specific race, I'm Asian too. I wasn't surprised - to be blatant, it made sense. We're very grade oriented with tiger parents. Then I learnt the extent of it. There were apparently Chinese forums and "outsourcers" you could send your homework problems to and they would solve it and give it back. In addition, there were special shared systems like DC++ where you could discover answers to homeworks for different classes at my university as well as Prelims, Midterms and Finals contributed by students of previous years. I was in shock. Students would leave exam halls to go to the bathroom just to look at these answers mid-exam. But was I gonna tattle? No.

IV. The reality at universities. Not just in CS, but in every other subject, almost everybody cheats. Excuses that go around are: "I've worked on it with someone else" "Oh the TA in office hours told everybody the exact same solution" "What? Cheating? me?" "Maybe he/she took it from me, I didn't do it"

And look, people aren't stupid. We all know how cheating works. You get a homework assignment, and you re-write the sentences in your own language. You get some code from someone else and you define some useless functions with 1-2 lines of code. Or you arbitrarily re-organize lines of code. You rename all the variables. You re-organize your functions. You create some unnecessary classes.

There were students who distribute 10 homework assignments between 10 people (in groups of 2), and have one do the assignment (use office hours, friends, google, whatever) and the other literally re-write the assignment in LaTeX 9 different ways for the others to use. No one would ever really have to do the work.

The well known key to cheating is plausible deniability - if there's enough evidence you didn't do it, you didn't do it.


And it's an even bigger problem with MEng/MS students. These are usually unfunded cash cow programs even at top universities. They accept fairly mediocre students from China and India and the class is usually 80% Chinese/Indian. A generalization, of course, but they have 0 intellectual curiosity. They are here to pay $50k-60k for 1 or 2 years, make sure they have as close to a 4.0 and then go get a tech job where they will make $150k/yr, and little to none of their skills from class would be needed.

And I can speak for Indians, but CS education in India aside from the IIT, the IIIT, BITS and some NITs is dismal. Cheating is rampant there, and they're much more well versed with the art because it's much harder to cheat and get away with it in India - you can't bring phones to your exam or freely go to the bathroom mid exam, for example.


Interesting article! The metric you use does eliminate triviality, but it sometimes uses very obscure (and arguably uninteresting words), such as calumnies, ivoriness, coprophagist, etc. That's what you describe as Webster's Second jargon that nobody knows".

It would be interesting if you could adapt your metric to account for general prevalence of the word in English. Scan a giant subsection of say Wikipedia, and assign a frequency to each of the 234,000 words in a map, giving unseen words an infinitely small frequency, and then use the sum or multiple of the frequencies of each of the anagrams to bring out some truly interesting ones!


I would strongly argue that coprophagist is a very interesting word, and should be less obscure. But then again, that might just be my juvenile sense of humor.


Calumnies and coprophagist aren't particularly obscure. I've come across both.

I think you have to discriminate between slightly obscure or archaic words that anyone familiar with a reasonable range of the literary canon would know, and truly uninteresting words that even a highly educated and well-read person wouldn't know.

There are better corpuses than Wikipedia that could be used for this purpose, like the British National Corpus

http://www.natcorp.ox.ac.uk/


Great read. I think the title "On Reddit, the earlier you comment, the louder your voice." or something to that tune would've made for a more impactful headline!


fixed.


Thanks!


My bad, fixed.


thx @dd267, not bad by the way, this is an amazing post. Did this post by @minimaxir inspire this work? ~ http://minimaxir.com/2014/10/hn-comments-about-comments/


Not really :P I wish I'd seen it before. I only learnt about it when minimaxir commented on this thread.


That's a super interesting thought. You should consider that the sum total of popularity of topics on HN up till today can't be used in hindsight as a predictor. It would be interesting to see if we merely looked for past spikes in keywords and used that to govern investment decisions. Even then, I fear that for every "bitcoin" and "apple", there may be other technologies and companies (especially smaller startups) that didn't work out so well, although I hypothesize a net positive.

Despite it being public data, because the information circulated on HN is at the core of technology, it could prove valuable to investors with limited knowledge of it (and might well be worth packaging and selling, haha).


I'd like to postulate that the average disposable income of an active hn user is probably, with respect to forums of the same class as hn (metafilter, reddit, digg, etc) one of the highest. (There's been historical self-reported polls eg. https://news.ycombinator.com/item?id=6464725 - 44% of respondents are in the top 10% income, ~25% are in the top 5% and ~4% are in the top 1%)

I'd also like to postulate that if you were to segment the market into "early adopters", hn would have a larger share of this segment then other forums in the same class, of an equivalent or greater volume of traffic.

If this postulation is correct, then effectively hn is "trendsetters with money" ... a good group to listen to.

I don't have data to back these claims up, but intuitively I feel they are pretty safe.

This of course doesn't give any indication of market velocity. I've done a number of investments based on HN at the wrong velocity - I presumed the stock had been undervalued because of hn content, when in fact, the market had YET to undervalue it. I forecasted a distant chance of success given an undervalued stock (in this case blackberry) - knowing that they were going to do an android with a physical keyboard <eventually>, and I invested upon this speculation --- well before the market doubted the future of the company.

As a result, I bought it way early and it fell precipitously and is only rebounding slightly now. So no, this isn't a magic sauce to time the events or how they will affect the market price, just perhaps one to forecast their eventuality.


Math is hard.


Oh damn, super cool stuff. I wish I'd seen this before. Looks like I replicated a lot of your work, but yeah trends seem to have stayed the same.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: