This is a fascinating experiment! I've just been reading the first few paragraphs of the paper ... easily readable, intended to be accessible by anyone.
In Gauss's time mathematicians would solve problems, publish the solutions in an encrypted form, and then challenge their contemporaries to solve the problems.
Here the authors of a paper on the arXiv say:
"To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time."
Tao says:
"... the challenge is to see whether 10 research-level problems (that arose in the course of the authors research) are amenable to modern AI tools within a fixed time period (until Feb 13).
"The problems appear to be out of reach of current "one-shot" AI prompts, but were solved by human domain experts, and would presumably a fair fraction would also be solvable by other domain experts equipped with AI tools. They are technical enough that a non-domain-expert would struggle to verify any AI-generated output on these problems, so it seems quite challenging to me to have such a non-expert solve any of these problems, but one could always be surprised."
The title I've chosen here is carefully selected to highlight one of the main points. It comes (lightly edited for length) from this paragraph:
Far more insidious, however, was something else we discovered:
More than two-thirds of these articles failed verification.
That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not. For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.
FWIW, this is a fairly common problem on Wikipedia in political articles, predating AI. I encourage you to give it a try and verify some citations. A lot of them turn out to be more or less bogus.
I'm not saying that AI isn't making it worse, but bad-faith editing is commonplace when it comes to hot-button topics.
Any articles where newspapers are the main source are basically just propaganda. An encyclopaedia should not be in the business of laundering yellow journalism into what is supposed to be a tertiary resource. If they banned this practice, that would immediately deal with this issue.
A blanket dimsissal is a simple way to avoid dealing with complexity, here both in understanding the problem and forming solutions. Obviously not all newspapers are propaganda and at the same time not all can be trusted; not everything in the same newspaper or any other news source is of the same accuracy; nothing is completely trustworthy or completely untrustworthy.
I think accepting that gets us to the starting line. Then we need to apply a lot of critical thought to sometimes difficult judgments.
IMHO quality newspapers do an excellent job - generally better than any other category of source on current affairs, but far from perfect. I remember a recent article for which they intervied over 100 people, got ahold of secret documents, read thousands of pages, consulted experts .... That's not a blog post or Twitter take, or even a HN comment :), but we still need to examine it critically to find the value and the flaws.
People here are claiming that this is true of humans as well. Apart from the fact that bad content can be generated much faster with LLMs, what's your feeling about that criticism? It's there any measure of how many submissions before LLMs make unsubstantiated claims?
Thank you for publishing this work. Very useful reminder to verify sources ourselves!
I have indeed seen that with humans as well, including in conference papers and medical journals. The reference citations in papers is seen by many authors as another section they need to fill to get their articles accepted, not as a natural byproduct of writing an article.
403 ERROR
The request could not be satisfied.
Request blocked. We can't connect to the server for this app
or website at this time. There might be too much traffic or
a configuration error. Try again later, or contact the app
or website owner.
If you provide content to customers through CloudFront, you
can find steps to troubleshoot and help prevent this error by
reviewing the CloudFront documentation.
Generated by cloudfront (CloudFront)
Request ID: IFiQvbhPlrP5MaRdM5km5yAdFAEmvC_IUx2LA899aXly11zm3wAoKg==
No, apparently eu-west-1 went castors up earlier. I wouldn't be surprised if there was something related to this error.
The site came back around eu-west-1 which, while correlation isn't causation, it does look meaningfully in causation's direction and wiggle an eyebrow suggestively.
One of the talks I give has this in it. The talk includes Continued Fractions and how they can be used to create approximations. That the way to find 355/113 as an excellent approximation to pi, and other similarly excellent approximations.
I also talk about the Continued Fraction algorithm for factorising integers, which is still one of the fastest methods for numbers in a certain range.
Continued Fractions also give what is, to me, one of the nicest proofs that sqrt(2) is irrational.
Thanks! Do you have a version of that talk published anywhere? I tried searching your YouTube channel [1] for a few things like "golden ratio" "ratio", "irrational"... but didn't find anything.
Aw thanks, but I think the Mathologer and Numberphile videos are sufficient for me if you haven't already uploaded yours. I don't want to bother you doing extra work for little return!
I honestly should sketch out the talk anyway. I haven't seen anyone else bring together the proof that sqrt(2) is irrational, and the Continued Fraction method of factoring.
Yeah, maybe I'll hack out a sketch tomorrow, show it to a few people, and get them to tell me what's missing so I can flesh it out.
This was a long time ago, so we didn't have GPUs or fancy rendering h/ware. We addressed every pixel individually.
So a radar image was painted to the screen, and then the next update was painted on top of that. But that just gives the live radar image ... we wanted moving objects to leave "snail trails".
So what you do for each update is:
* Decrement the existing pixel;
* Update the pixel with the max of the incoming value and the decremented value.
This then leaves stationary targets in place, and anything that's moving leaves a trail behind it so when you look at the screen it's instantly obvious where everything is, and how fast they're moving.
Ideally you'd want to decrement every pixel by one every tenth of a second or so, but that wasn't possible with the h/ware speed we had. So instead we decremented every Nth pixel by D and cycled through the pixels.
But that created stripes, so we needed to access the pixels in a pseudo-random fashion without leaving stripes. The area we were painting was 1024x1024, so what we did was start at the zeroth pixel and step by a prime number size, wrapping around. But what prime number?
We chose a prime close to (2^20)/phi. (Actually we didn't, but that was the starting point for a more complex calculation)
Since phi has no good rational approximation, this didn't leave stripes. It created an evenly spread speckle pattern. The rate of fade was controlled by changing D, and it was very effective.
Worked a treat on our limited hardware (ARM7 on a RiscPC) and easy enough to program directly in ARM assembler.
I was stepping out with my wife for a day out and had read your reply very cursorily. That reading had left me quite puzzled -- "I would have done exponentially weighted moving average (EWMA) over time for trails. Why is \phi important here in any form. Is \phi the weight of the EWMA ?".
Now I get it, decrementing the pixels were quite peripheral to the main story.
The main story is that of finding a scan sequence that (a) cycles through a set of points without repetition and (b) without obvious patterns discernible to the eye.
In this, the use \phi is indeed neat. I don't think it would have occurred to me. I would have gone with some shift register sequence with cycle length 1024 * 1024 or a space filling curve on such a grid.
This becomes even more interesting if you include the desiderata that the minimum distance between any two temporally adjacent pixels must not be small (to avoid temporal hot spots).
Finding MiniMax, min over temporal adjacency, max over all 1024* 1024! sequences, might be intractable.
Another interesting formulation could be, that for any fixed kxk sized disc that could be drawn on the grid, the temporal interval between any two "revisit" events need to be independent of the disk's position on the grid.
I think this is the road to small discrepancy sequences of quasi Monte Carlo.
32MB ram <-- no way. 4 and 8MB were the standard (8MB being grand), you could find 16MB on some Pentiums. So 40MB drive and 32MB RAM is an exceptionally unlikely combo.
Nah, as the other poster said 4 or 8 MB was what was common on 486 machines. Even less on 386. Most 386 motherboards didn't even support more than 16MB.
So this depends if it was a 72 pin DIMM board. I don't think you could get there (easily?) on a 30 pin board, but 72 may have had native support for 64 out of the box.
Yeah, IIRC my first computer, or at least the first one I really maintained, was a Pentium 2 with 32MB of ram and a 2gb hard drive. Good ole gateway pcs.
My Dad had one of them. The first machine I actually purchased myself was a Dragon 32 (6809 processor, 32k RAM) sometime around 1981 - i can remember everything about it, including all the terrible cassette games I bought for it and the money I spent on ROM cartridges (word processor, assembler/debugger). These days I can't even remember what's in my Steam library.
I still have my Mac 128k with external disk drive and printer. Bought new in Jan 1985 or late Dec 1984. I paid the exorbitant price to upgrade it to 512k during the first year I owned it. I think the RAM needed to be desoldered and new chips soldered in place so it needed to be returned to the store where I bought it.
Shout out to the author of the blog for writing an engaging post that accurately the MS experience. For me, switching is still a work in progress since I am the family troubleshooter and there are lots of things to mess with. It will happen because so far, the ones I have switched have no complaints.
Interestingly I can't remember any specs since about 22 years ago.
First modern PC (dos/win3.1) I had a 12mhz 286, 1 meg of ram, AT keyboard, 40MB hard drive. This progressed via a 486/sx33/4m/170mb and at one point a pentium2 600 with (eventually) 96mb of ram, 2g hard drive, then a p3 of some sort, but after that it's just "whatever".
First family PC was a used IBM PC XT, 8088 w/ 640kb ram and a cga card with an amber monochrome monitor attached. I remember getting a 14.4 modem on it, and it would freeze, had to force it to 9600bps. Then managed to wranle a 486sx w/ 4mb ram and an EGA card and display.
First decent computer I built was an AMD 5x86 133mhz with the larger cache module and a whopping 64mb ram that I'd traded for some ANSi work. The irony is for some things it ran circles around the Pentiums that friends had, for others it just slogged. Ran OS/2 warp like a beast though. Ever since then, I've mostly maxed out the ram in my systems... I wend from 128gb down to 96GB for my AM5 build though since the most I've ever used is around 75gb, and I wanted to stick to a single pair at a higher speed.
36 years ago: A Wyse branded AT clone 12.5MHz 286 with 1MB of ram, a 10MB hard drive and a Hercules graphics card (it was a decommissioned CAD machine from my dad's work).
I documented all of my early computers throughout early college, and I'm glad I did. I remember the first computers well, but without those notes, I wouldn't remember the first ten in so much detail. My first computer that was not a family computer was: UMAX 233mhz Pentium 2, 64Mb Ram, 8Gb HDD (was crushed when sat on by sibling)
The 5150 was just an "IBM PC" not an XT, but still... I think we're talking about the same thing.
I still have mine! 4.77 MHz 8088, 8087 math coprocessor, CGA graphics card, 5.25" floppy (360K, double-sided, double-density), 20 MB Seagate hard drive (I believe the motherboard has newer ROM chips to support that), AST SixPakPlus expansion card to bring it up to 640 KB RAM and a Parallel Port, a Serial Port, a Game Port, and a Real Time Clock (so you don't have to type in the date and time every bootup.) At one point I had a Sound Blaster as well, which was nice. The floppy drive and the hard drive each have their own controller cards so there's almost no more room for expansion! The motherboard also has the keyboard and cassette (!!!) port. I get an error code about the cassette port so I doubt it would work but I never had the equipment to try it out anyway.
In Gauss's time mathematicians would solve problems, publish the solutions in an encrypted form, and then challenge their contemporaries to solve the problems.
Here the authors of a paper on the arXiv say:
"To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time."
Tao says:
"... the challenge is to see whether 10 research-level problems (that arose in the course of the authors research) are amenable to modern AI tools within a fixed time period (until Feb 13).
"The problems appear to be out of reach of current "one-shot" AI prompts, but were solved by human domain experts, and would presumably a fair fraction would also be solvable by other domain experts equipped with AI tools. They are technical enough that a non-domain-expert would struggle to verify any AI-generated output on these problems, so it seems quite challenging to me to have such a non-expert solve any of these problems, but one could always be surprised."
reply