Instead of using Python, here is a solution that only requires sh, curl, sed, sort, uniq and grep.
This solution uses a generous 87s delay to retrieve the Amazon pages. There are 328 films listed as "great movies" on rogerebert.com. As such, the script, named "1.sh", needs 8h to complete, e.g., the time while you are at work or sleeping. No cookies, no state, no problems.
Usage: sh -c 1.sh > 1.html
Open 1.html in a browser and it shows whether each "great movie" is available as Prime Video or whether it is only available in some other format, such as Blu-ray, DVD, Multi-format, Hardcover. A link to the item on Amazon is provided.
There has always been a place for commerce and marketing on the web."
Not really true as I remember it. The web opened up to the public in 1993. There was no commerce and marketing in the beginning. Even by 1996 while commerce and marketing may have existed, e.g., Amazon founded in 1995, its place was in the background. As I rememember the early web, the foreground, the "starting point" or "portal", was something like Yahoo! You had to pick a topic (direction) that you wanted to go in. For example, if you were after music, you might end up browsing the Internet Underground Music Archive. The "front page" of the portal was predominantly non-commercial, mostly generic headings for topics. If you wanted to search out something commercial, no doubt you could but the initial starting point was intellectual curiosity. This is IMO what has been lost over time with regard to web use: intellectual curiosity and the ability to actually satisfy it. (A fun tangent here is the collections of inane queries that people type into Google. These are simultaneously hilarious and disturbing.)
As an experiment have a look at the Yahoo! page today. It is full of low quality mainstream "news". There is zero attention to intellectual curiosity. Nothing to see here, folks, but here is the latest news. For part 2 of the experiment, run a Google search for the term "music". The results are dominated by YouTube. Every result is directly or indirectly commercial (either selling something or conducting surveillance and serving ads), except one: Wikipedia. The chances of someone new to the web not following a link to YouTube or some other Google-controlled domain would seem almost nil.
The "onboarding" process for new web users is very different today than it was in the early 1990's. Perhaps it is still possible to approach the web with a sense of awe and wonder, pondering "What is out there?" However a new web user is scant likely to end up on a non-commercial website besides Wikipedia. What is out there? Surveillance, ads and an endless supply of soon-to-be-obsolete Javascript du jour.
The old Web directories had a "Business" section where everything commercial or sale-related was listed. Then sub-levels of any other section, e.g. "Music", would include a cross-reference to the same topic in the "Business" hierarchy. But the default assumption was that you were looking for non-commercial, or at most ad-supported sites.
Why dump billions of dollars then? Nowhere else to spend it? Effective marketing?[1] Is no one asking this question?
"... and critically: there's no one to hold responsible for getting it wrong."
Could this be part of "AI"'s appeal? A dream of absolving businesses and individuals from accountability.[2]
1. "What's more, artificial research teams lack an awareness of the specific business processes and tasks that could be automated in the first place. Researchers would need to develop an intuition of the business processes involved. We haven't seen this happen in too many areas."
2. Including the ones who designed the "AI" system.
> Why dump billions of dollars then? Nowhere else to spend it? Effective marketing? Is no one asking this question?
Because whoever does achieve the next unlock - should it happen - will receive an unimaginably large windfall. This is the classic intent of venture capital. In fact, I'd suggest that AI is actually one industry where VC is doing what it does best: taking extremely risky bets with a large potential upside.
> Could this be part of "AI"'s appeal? A dream of absolving businesses and individuals from accountability.
Presently, this seems to be one of its large detractors. If I have an employee do something stupid, I can say that an employee did something stupid. People might wonder why they were allowed to do that stupid thing, and what we're going to do to prevent it from happening again, but the explanation of the source is satisfactory. We're fallible, and we understand the fallibility of others (generally speaking).
AI is not that at all. If my automation does something stupid, I still have all the blame, and yet I have nowhere else to pass it off to. "We don't understand why our AI did this really stupid thing" is, frankly, not a satisfying response (nor should it be). Businesses employing AI certainly are not absolved of any form of accountability, and are arguably exposed to more of it (since they're not able to pass the blame on to another fallible human, and have to take direct accountability of a system they built but don't fully understand).
Having worked in both industries I prefer working with wet science people. For some reason they generally have a much healthier perspective on life. Their work is humbling because it is, and will forever be, full of unsolved mysteries, not simply because it is challenging. The other folks, whether they call themselves "scientists" or "engineers" or "developers" or "coders" or whatever, are working with something that as far as I can see has no inherent connection to the natural world, other than being a production of the human mind. Perhap that affects the perspective many of them have on life. For example, how common among them is this belief that all things, not simply computers, can be thoroughly understood and mastered. Note this is pure opinion, not fact, and I am generalising; there are exceptions to every generalisation.
I moved into programming from Neuroscience. The first thing programmers do when they learn this is talk to me about how their neural network does xyz.
I don’t know if it’s ignorance, naïveté, or hubris, but it’s amazing to me that these programmers think the world/universe/reality is a complex problem that could be easily understood eventually. When working with “wet” scientists I found that attitude was almost non existent. The complexity is just so high and there are so many unknowns that many of them are very comfortable saying “I don’t know” or “we may never know.”
One of my favorite examples to give is when I was still in undergrad, endocannabinoid research was getting hot in the Neuroscience field because it challenged the mental model that neurons communicate in a “linear” or “feed forward” fashion. Are neural networks going to implement that? Probably not, and it’s probably not worth it because at this point it introduces unneeded complexity. Try replicating the biochem of an entire cell for each cell in a NN and you _might_ be half way to achieving the complexity of the human brain.
I’m not saying this is impossible, all I’m saying that I find it remarkable how quickly programmers seem to think of themselves as “expert” on outside fields as if they’re the smartest people in the world. I will say, crusty old systems programmers tend to have more of the familiar characteristics of when I was in life sciences (Neuroscience and Genetics).
As someone who has/does dabble in genetic algorithms and neural networks, it's always wise to keep in the back of ones mind that these systems are inspired by biology, and not generally an attempt to replicate it[0]. It's also often useful to go back to biology with an eye for ideas to steal, but rarely are they are useful model to inform biology or biological understanding.
As an anecdote, I once had a summer project between the neuroscience and computer science departments at my university. They had data from rat brains that they'd potentiated parts of (basically, zapped some neurons so they connection weights (in NN terms) got messed up and were sending too-strong signals to their neighbours), and how that potentiation decayed over time. They got me to attempt to reproduce it in a neural network. So, I build an NN system with the ability to have neurons zapped and managed to reproduce their results. But NNs being a very abstract model of a set of neurons, there are a lot of parameters that can be twiddled. By making fairly small changes to those parameters, I managed to get the exact inverse of their results also.
[0] this applies both to computer scientists building them, and also to biologists looking at them and going "that's a really poor attempt to be a brain, look at all the things it's missing."
I think the level of control programmers have over their domain naturally gives rise to that sort of overconfidence. You need to remember that computer systems are built on human made abstractions to human standards and follow human defined logic. DNA is not code, it's just a molecule that reacts with stuff, as are all the other molecules. They exist as they are and are their own system that needs to be understood, we did not create that system. Chemistry and probability and time did.
What if PubMed had something like Google's "I'm feeling lucky"?
What if we could explore PubMed by selecting a random PubMed URL instead of searching?
This script generates a random PubMed URL.
To do this we need to know the maximum PMID number in the PubMed database. The current max is included in the script and will be saved in a 9-byte file named "max-PMID" when the script is run.
If run with the argument "update" it will search for a newer max PMID.
If a newer max PMID is found, the script updates the number in the max-PMID file and in the script itself.
An alternative is to use the ftp server[1] to find the max PMID; I noticed the latest ftp update was missing new PMID's caught by this script.
If run without any arguments it selects a random PMID between 1 and the max and outputs a URL.
uses socat, GNU sed and requires a fifo named "1.fifo"
1. ftp://ftp.ncbi.nlm.nih.gov/pubmed/updatefiles/
#!/bin/sh
test -s max-PMID||echo 32446294 > max-PMID;read x < max-PMID;x=$((x-1));h=pubmed.ncbi.nlm.nih.gov;
test ${#x} -eq 8||exec echo weird max-PMID;sed -i "/test/s/echo [0-9]\{8\} /echo $x /" $0;
case $1 in update) mkfifo 1.fifo 2>/dev/null;test -p 1.fifo||exec echo need 1.fifo;
(grep "<title>PMID .* is not available" < 1.fifo|sed 1q|sed 's/<title>PMID //;s/ *//;s/ .*//;' >max-PMID)&
y=$((x+10000));seq $x $y|sed '$!s|.*|GET /&/ HTTP/1.1\r\nHost: '"$h"'\r\nConnection: keep-alive\r\n\r\n|;
$s|.*|GET /&/ HTTP/1.1\r\nHost: '"$h"'\r\nConnection: close\r\n\r\n|'|socat - ssl:$h:443 >1.fifo 2>/dev/null;
;;"")awk -v min=1 -v max=$x 'BEGIN{srand();printf "https://'$h'/" int(min+rand()*(max-min+1)) "/\n"}';esac
I could be wrong, I am not a Prime Video user, but the result I got was that there are 217 movies in Prime Video from Ebert's great movies.
Instructions on how to generate 1.html are here: https://news.ycombinator.com/item?id=23508182