Hacker Newsnew | past | comments | ask | show | jobs | submit | jsrn's commentslogin

Does software count as an 'item' for you? If yes, for me (and I guess many others here) it would be various products of JetBrains, also Kotlin (even if I did not pay for that one directly) and OCR software from Abbyy. Also did not pay for, but use daily: nginx.


JetBrains are Czech, not Russian.


IIRC they're headquartered in CZ, but the developers were mostly in St. Petersburg, and Kotlin is named after a Russian military island.


Regarding this: "I wish Perl would do an inplace edit of a file without creating a backup, though."

    $ perldoc perlrun
(or http://perldoc.perl.org/perlrun.html)

says:

> If no extension is supplied, and your system supports it, the original file is kept open without a name while the output is redirected to a new file with the original filename. When perl exits, cleanly or not, the original file is unlinked.

Which system are you using? With macOS and Linux, I get no automatic .bak extension when not providing a backup suffix, i.e. it behaves like you want under these systems.

Update: Apparently, anonymous files are not supported by Windows: http://stackoverflow.com/a/30539016 which would explain the behaviour you describe.


Here is the command with jq instead of json_pp

    $ curl -sS 'https://www.google.com/complete/search?client=hp&hl=en&xhr=t&q=aurora' | jq . | gsed -nE '/<\/?b>/{s```g;s`"|,|^ *``g;p}'
(see my other comment about gsed vs sed)


You don't need sed, you can do it all in jq:

    curl -sS 'https://www.google.com/complete/search?client=hp&hl=en&xhr=t&q=aurora' | jq -r '.[1] | .[][0] | sub("</?b>"; ""; "gi")'


Here's the above modified to work with BSD sed:

  curl -sS 'https://www.google.com/complete/search?client=hp&hl=en&xhr=t&q=aurora' | jq . | sed -nE '/<\/?b>/{s```g;s`"|,|^ *``g;p;}'
(Edited to remove previous advice about inserting a newline)


Nice!

To run this with macOS, I had to use the GNU version of sed. I installed it with

    $ brew install gnu-sed
And it is then called with 'gsed' instead of 'sed'.

As an avid Perl programmer, I had json_pp in my $PATH - for everyone else - it is here: https://metacpan.org/pod/JSON::PP

You can install it with cpanm:

    $ cpanm JSON::PP
If you don't have cpanm, you can install it with

    $ curl -L https://cpanmin.us | perl - App::cpanminus
The modified command line from above then becomes:

    $ curl -sS 'https://www.google.com/complete/search?client=hp&hl=en&xhr=t&q=aurora' \
        | json_pp | gsed -nE '/<\/?b>/{s```g;s`"|,|^ *``g;p}'
Here is a little Bash function to encapsulate this:

    $ function c() { curl -sS "https://www.google.com/complete/search?client=hp&hl=en&xhr=t&q=$1" | json_pp | gsed -nE '/<\/?b>/{s```g;s`"|,|^ *``g;p}'; }
Which then allows you to use it like this:

    $ c hacker
    hacker news
    hacker typer
    hackerrank
    hackerman
    hackers movie
    hacker fare
    hackerone
    hackers cast
    hacker pschorr
For spaces in your query, use a '+':

    $ c New+York
    new york times
    new york and company
    new york giants
    new york post
    new york daily news
    new york weather


Improved version using jq. This properly URL encodes the query parameter and uses a much simpler sed command. Additionally the API returns UTF-8 encoded data when the user agent is specified. Requires curl >= 7.18.0.

    function c() {
        url='https://www.google.com/complete/search?client=hp&hl=en&xhr=t'
        # NB: user-agent must be specified to get back UTF-8 data!
        curl -H 'user-agent: Mozilla/5.0' -sSG --data-urlencode "q=$*" "$url" |
            jq -r .[1][][0] |
            sed 's,</\?b>,,g'
    }
Example:

    $ c ':)' ':('
    ) ( meaning
    ) ( emoticon
    ) ( ͡° ͜ʖ ͡°)
    ) ( emoticon meaning


You can use BSD sed by just inserting a semicolon into the sed pattern:

  sed -nE '/<\/?b>/{s```g;s`"|,|^ *``g;p;}'
(Edited to remove previous advice about inserting a newline)


Invented in 1998 by Perl hacker Abigail:

http://neilk.net/blog/2000/06/01/abigails-regex-to-test-for-...

(check out Abigail's other JAPHs if you like stuff like this)


Yep, you are correct :)


Regarding your (2.) - just to clarify this for others: With an enterprise provisioning profile, you can deploy your app to an arbitrary number of devices. These devices do not need to be known before via UDID (as with normal OTA provisioning). You can just upload the app to a server and ask people to download. So the devices do not technically have to belong to the same organization (even if this is perhaps Apple's intent). E.g., a friend of mine uses an enterprise account which they aquired for their university to deploy an app for a study (for data gathering). The subjects who install the app and enter their data are not all part of the university. Apple was informed about this use case and they did not have anything against it.

But, AFAIK Apple centrally checks the validity of the profile with each download and thus they are certainly able to detect if you use your enterprise profile to circumvent the appstore.


What would you use as input for the deep network?

I have only worked with text classification methods where I chose the features myself. As I understand it, a deep network still has (like a 'traditional'/non-deep ANN) a fixed number of inputs in its input layer, i.e. one would have to process each input text somehow before feeding it into the network (to make the input sizes equal). Is there a usual way to do this without doing feature-extraction?


If using raw bag of words or n-grams, why not hash the strings to maybe 2^13-1 slots, with something like MurmurHash3 or with multiple hashes to prevent collisions, then use that sparse vector as input to a deep learning model?


Thanks, that makes sense.

So the parameter would be the number of slots (== number of input units of the deep NN).

And the transformation of the text into bag-of-words / n-grams would not be considered feature-engineering - or at least only 'low level feature engineering' - the higher level features will be learned by the deep network.

I guess one could go lower level still and even do away with bag-of-words / n-grams : limit the text size to e.g. 20000 characters, represent each character value with a numerical value (e.g. its ASCII code point when dealing with mainly English texts) and then simply feed this vector of codepoints to the input layer of the deep network. Given enough input data, it should learn location-invariant representations like bag-of-words / n-grams (or even better ones) itself, right?



> because those who can benefit from this kind of basic regexp examples are also those who will not understand the limitations.

I share your sentiment. The article is certainly useful for learning regexps. But IMHO it should also point to the correct way of doing things - often, the correct way is using a module and thus the resulting code is not much longer than the code in the article. For email address validation:

    use Email::Valid;
    print (Email::Valid->address('john@example.com') ? 'valid' : 'invalid');
as a oneliner:

   perl -MEmail::Valid -E"say (Email::Valid->address('john@example.com') ? 'valid' : 'invalid');"
Other than installing the Email::Valid module with a

    cpanm Email::Valid
it is not much longer than the example in the article.

https://metacpan.org/module/Email::Valid


Good point. I updated article with a note about Email::Valid.


I guess you guys just read the code and not the text:

`Notice that I say "looks like". It doesn't guarantee it is an email address.`

And yeah, you can find the full regular expression in the back of one of O'Reilly's Perl books (the regular expression handbook I believe).

It's nice to see perl code from time to time, even if it's just one line :)


Point taken - what I primarily wanted to show is that the correct way is not much longer than the "looks like" solution. Yes, the Regexp is long, but it is nicely encapsulated in the Email::Valid module.


Yeah - I've read only the first sentence. I think many of those that will find that article from google and even use that code will also not pay much attention to that weak disclaimer. Also these were not the only problems with his code - see the comments at that page (in particular: http://www.catonmat.net/c/35784).

But the more important point is that an article that sounds so authoritative should present much higher quality.


But I doubt someone new to regexps would understand what "looks like" means in that context. For instance they might think "ok, so something like 'abc@efg.xyz' matches, even though it's not a real email address." They might not think to consider that a full sentence like "hey, I'll see you tomorrow @ 2. Can't wait!" also matches.

That said, perl one-liners are certainly useful so thanks to the OP for putting this together. I just think it would add a lot of value to include examples of where one is likely to go wrong.


Also, don't miss the movie adaptation of Roadside Picnic by Andrei Tarkovsky: "Stalker".


Tarkovsky is actually one of my favorite directors of all time! I love Stalker, Solaris and the mirror!


Stalker is perhaps the best movie of all time.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: