Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've had good fun playing around with this, it's certainly made NLP more approachable.

One issue though is that it seems to choke with certain characters.

For instance the character £ it seems to complain with this error message:

>>> TextBlob("£") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/eterm/nlp/local/lib/python2.7/site-packages/text/blob.py", line 340, in __repr__ return unicode("{cls}('{text}')".format(cls=class_name, text=self.raw)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 10: ordinal not in range(128)



Yeah, ditto. I created a new virtualenv with Python 3 and those problems disappeared. Previous to that I hacked around a bit and did the "from __future__ import unicode_literals" bit which alleviated the issue (but then ipython had problems with `repr(blob)`). I finally just gave up and ran `mkvirtualenv textblob --python=python3` (on Ubuntu 12.10).


Ah, I'm still using python 2 which might be causing me problems. For now I'll just try to work around it by hacking it out my source data.

(My source data is my own HN comments, it's funny doing sentiment analysis on them, seeing how objective or subjective it thinks my posts are as well as generally if I'm cheery or miserable.

(My end game is to produce an HN reader which only shows positive comments and news to reduce the amount of reading I do. ;))

It gets it mostly right, except the occasional hiccough, one of which is this following passage, which stood out as my most subjective post (1.0 on subjectivity!): "" Factorisation is unique, the addition of 3 primes is not.<p>e.g. 29 can be written 5 + 11 + 13 or 3 + 3 + 23<p>So even if it were a difficult operation to reverse addition of 3 numbers, it would be made easier by collisions. ""

I'm left stumped as to why nltk thinks this is not only subjective but a 1.0 completely subjective post!


Maybe you're just confusing the facts being stated ("even if it were a difficult operation to reverse addition of 3 numbers" implies it is easy, which is true as it's just an oblique restatement of the commutative property) with the language being used. "Difficult", "easier", "even if", etc.

I have no idea how sentiment analysis works though.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: