Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
James while John had had had ... had had had a better effect on the teacher (wikipedia.org)
114 points by DanielRibeiro on March 3, 2013 | hide | past | favorite | 74 comments


A friend of mine tells a story that's about as close as real life gets to this kind of language trick. He was helping a friend in college study for her Test of English as a Foreign Language. She was working on past tenses and, sincerely attempting to explain a mistake she'd made, he told her: "If you had had 'had' here, you would have had to have had 'had' there as well." Whereupon she screamed.


As English, this stuff is totally incomprehensible and unusable. Absolutely nothing is conveyed to actual human English speakers by saying the word 'buffalo' 400 times in a row.

If it is 'grammatical' then it is grammatical by virtue of conforming to some idealized grammar. But when this grammar is so far off not just from anything people say, but anything they can actually understand, it really only means that the idea that this grammar models real English has been reduced to total absurdity.


They're tricks, but they're far from incomprehensible. In my experience, both with this sentence and the buffalo one, there's a certain mental click when one "gets it", after which the sentence makes sense and one can "feel" its grammatical structure. It's a curious and rather Chomskyan experience. Before that, of course, the notion that such a string of words might mean anything is absurd. "Getting it" is much like those 3D visual puzzles where at first you see only a noise pattern, but when you hold it at the right distance and let your eyes refocus a certain way, a picture leaps out at you.


Don't take it too literally. It's meant as a kind of linguistic koan to illustrate the concept of prosody.

http://en.wikipedia.org/wiki/Prosody_(linguistics)

Consider the difference in how one says

    He ate that
vs

    He ate that?
That's prosody. The difference in meaning isn't a consequence of the presence of the question mark, although that's what people think of as "grammatical." The presence of the question mark and the difference in meaning are both consequences of the prosodic differences between the two sentences.

This shows one of many challenges inherent computational linguistics: "the written word" only encapsulates a small part of what it means to "speak English" or "understand English."

As a nice benefit, it makes the sort of grammarian who obsesses over the written word look (rightly) like they're missing the forest for the trees.


And further to that, there is a difference between "He ate that?" (He did what to that?) and "He ate that?" (He ate what?)


While you have a point in there, some things are still missing. In my opinion the buffalo sentence doesn't have enough prosody to ever make the verbal version intelligible without explanation. The sentence in the OP does, but it's also missing mandatory punctuation. Punctuation conveys a solid fraction of the information prosody does, and sometimes even contains information prosody doesn't.


Actually, I find it perfectly comprehensible when said aloud with the right emphasis and pacing - it's only difficult to understand when written down with no punctuation.


You may also be interested in http://en.wikipedia.org/wiki/Garden_path_sentence, which has some sentences that are clearly grammatical but not what they seem at first glance.


I'm not convinced this one is even grammatical. At minimum, omitting the semicolon should make it a run-on sentence.


The trick to all of these sentences is to omit otherwise necessary punctuation. The rare exceptions are sentences describing recursive concepts. For instance, if you have a radar detector, the police will catch you because they have a radar detector detector, which makes it imperative that you own a radar detector detector detector.

Here's another example: who polices the police? If there were any one agency in charge of that, certainly we would call them the police police. But who polices the police police? Clearly, the police police police police the police police.


There may be a lack of pronunciation but written English has clear rules demanding punctuation. The use of quoted words without quotation marks in the OP 'sentence' is ridiculous.

Buffalo buffalo and radar detector detector detector are much more valid.


Ugh, I meant punctuation. (Edited my comment to reflect that as well, but let the record stand that it originally had "pronunciation".)

My brain puts the words "punctuation" and "pronunciation" in the same hash bucket so I can never get the right one out reliably.


Oh. Well the Buffalo sentence doesn't omit punctuation, yet manages to be perfectly confusing. :)


The article title is ungrammatical without correct punctuation. With correct punctuation, it's fine - and it is, after all, talking directly about an error in the very grammatical construct it is highlighting. It's a perfectly natural sentence that could easily come about in normal discussion.

The 'buffalo' one is just nonsensical - the word 'buffalo' just isn't used that way, and even with punctuation, needs to be separately explained for people to understand it - even if they are aware of the regional dialect that uses the word 'buffalo' as a verb.


If English was a workable language, English majors would have nothing to base their theses on. The 'had had had' exercise highlights the absurd nature of English. This philosophy on English is why it stopped evolving after its peak - post English after the death of the worlds greatest playwright Shakespeare (who wrote phonetically, may I add).

On the other hand it does give an insight into syntax trees and parsing.


Every language has absurdities. Genders for non-gendered things is one example.

What's really absurd about English is the contempt for diacritical marks. Other languages give you a clue as to how the word is pronounced, whereas in English, if I write 'wind', you don't know if I'm talking about air blowing or charging a mechanical clock unless you have context - which may come later in the sentence.


> absurdities: Genders for non-gendered things is one example.

This. I never understood how e.g. Spanish speakers think that a door is female or a clock is male. I mean, it's not like there are any body parts you can examine for a definitive answer, or clothing and mannerisms which let you make a pretty good guess...I never really got a satisfactory answer other than "it's usually -o or -a, but not always; really, you just have to memorize it." Seriously...WTF?

> diacritical marks

Other European languages love them. To me as an English speaker, they look like misplaced inkspots or dirt on my monitor. I never had any class in school or college that taught what they mean [1]. I blithely type "fiancee," "naive" and "Geiger-Muller," since I don't want to get out Character Map or whatever the Linux equivalent is [2], and I'm not really sure which marks to use or where to put them. I pretty much pretend they don't exist, unless they cause compiler errors [3], in which case I terminate them with extreme prejudice.

[1] I did once learn that an overline (a line above a character; I don't know if that's actually what it's called) means a long vowel sound, and an upside-down e means schwa. I haven't seen either of these used outside dictionary pronunciation keys.

[2] In my current operating system, Linux Mint, I don't even know how to get those characters other than copy-and-pasting the Unicode text somebody else has put on a webpage, or spending an hour or two sitting down with the RFC's that specify UTF-8 encoding and a hex editor. The only reason I know on Windows is that I eventually stumbled on Character Map by curiously exploring all the menus. This may give you a clue how often I deal with international text

[3] http://news.ycombinator.com/item?id=5316875


"I never understood how e.g. Spanish speakers think that a door is female or a clock is male. I mean, it's not like there are any body parts you can examine for a definitive answer..."

But "gender," as a term in grammar, just means an arbitrary classification of words for grammatical purposes. It's only a few languages which, absurdly, map these grammatical tags to biological or cultural sex distinctions. English compounds the absurdity by retaining this grammatical distinction only in this one bizarre case.

(My favourite example of the arbitrary nature of grammatical gender is Dutch. Dutch has two genders, common and neuter, which, if you wanted to map them to sex, would mean "either male or female" and "neither male nor female", respectively.)


English used to have diacritical marks, specifically the diaereses, as can be seen in names like Zoë or in the surname Brontë (as in the family of English authors.)

The New Yorker loves the diaereses to this day, and frequently uses it in words like "coöperate".

http://en.wikipedia.org/wiki/Diaeresis_(diacritic)


I made the realisation when I went to Vietnam, where basically you right a sentence then shake a bagful of diacritics over it. I thought "Heh, English doesn't require any of that nonsense... hey... wait a minute..."


> What's really absurd about English is the contempt for diacritical marks.

Not just English, but also Chinese. Pinyin was originally specified as having marks over the vowels:

  āáǎà ēéěè īíǐì ōóǒò ūúǔù üǖǘǚǜ
But although you see pinyin used a lot in mainland China alongside Chinese characters, you virtually never see those diacritics, just:

  a e i o u v
where v is used instead of ü.


This can go on indefinitely, recursively.

The story behind this sentence is of two students comparing their texts, the one with "had" (version 1), the other with "had had" version 1, yielding 11 "had"s in a row. Call that story S_0. Now imagine a story S_(n+1) where two students write texts discussing story S_n, with version 1 replaced by the consecutive row of "had"s from S_n, and version 2 replaced by a similar row of "had"s, but one longer.


An example which itself suggests a possible recursion:

"Wouldn't the sentence 'I want to put a hyphen between the words Fish and And and And and Chips in my Fish-And-Chips sign' have been clearer if quotation marks had been placed before Fish, and between Fish and and, and and and And, and And and and, and and and And, and And and and, and and and Chips, as well as after Chips?"


Fish and And and And and Chips

The correct way to do this sentence is "I want to put hyphens between the words Fish, And,* and Chips in my Fish-And-Chips sign"

*The Oxford comma makes a lot more sense because it reflects how we speak, but even if you omit it, this is the correct way to present a list of items - you don't use 'and' between every one.

Edit: I was wondering which other rules I was breaking with that sentence - I'm sure there's more :)


You'd better write hyphens, as in "I want to put hyphens between the words Fish, And, and Chips in my Fish-And-Chips sign".


I'm pretty sure the correct way to write any of these grammatical puzzler sentences removes the grammatical puzzle, because grammatical puzzles are difficult and impede understanding, which is presumably the entire purpose of language to begin with.


During an English class in High School, my teacher (we'll call him Mr. Jones) came into the classroom and saw the following written on the blackboard.

    YOU SUCK JONES
Mr. Jones proceeded to make this into a lesson about the importance of commas.

    YOU SUCK, JONES
and

    YOU, SUCK JONES
have wildly different meanings.


There was a sign in front of a lake saying:

    PRIVATE PROPERTY
    NO SWIMMING ALLOWED
But by adding a few punctuation marks:

    PRIVATE PROPERTY?
    NO, SWIMMING ALLOWED.
The meaning of the sign was reversed.


I get a chuckle out of the

    SLOW
  CHILDREN
signs. Should you read that as:

  Slow Children
or

  Slow, Children?
When you see a

  DEAF
  CHILD
sign nearby, the inappropriate interpretation is reinforced.


We learned something pretty useful at the bird sanctuary: "Quiet birds have ears".


Or that episode of the Simpons where Marge saw "SUGAR FREE DONUTS" until Apu added a comma between "sugar" and "free", and explained that it was actually sugar with free donuts.


Wouldn't the sentence 'I want to put a hyphen between the words Fish and And and And and Chips in my Fish-And-Chips sign' have been clearer if quotation marks had been placed before Fish, and between Fish and and, and and and And, and And and and, and and and And, and And and and, and and and Chips, as well as after Chips?


Wouldn't the sentence, "Wouldn't the sentence 'I want to put a hyphen between the words Fish and And and And and Chips in my Fish-And-Chips sign' have been clearer if quotation marks had been placed before Fish, and between Fish and and, and and and And, and And and and, and and and And, and And and and, and and and Chips, as well as after Chips?" have been clearer if quotation marks had been placed before Fish, and between Fish and and, and between and and between, and between between Fish, and between Fish and and, and and and and, and and and and, and and and and, and and and and, and and and And, and and And and and, and and and And, and And and and, and and and and, and and and and, and and and and, and and and and, and and and And, and And and and, and and and And, and And and and, and and and and, and and and and, And and and and, And and and and, And and and Chips, as well as after Chips?


No.

I'm going to go with no. If you'll excuse me, I need to go lie down because that sentence broke something in my understanding of english.


I think you accidentally a whole bottle.


I do not know where family doctors acquired illegibly perplexing handwriting, nevertheless, extraordinary pharmaceutical intellectuality counterbalancing indecipherability transcendentalizes intercommunication's incomprehensibleness.

word 1 = 1 letter, word 20 = 20 letters. A friend didn't like "intercommunication's", but hasn't replied on the worth of 'intercommunicationy'...


"John wrote 'had'. James wrote 'had had'. James' answer had a better effect on the teacher."


James's answer had had a better effect on the teacher ;)


Only if the previous two sentences contain "had written."


No, crntaylor has it right: we're supposed to understand that James wrote what he did because it had had a better effect on the teacher in the past. That's how you can justify squeezing that one last "had" in there.


Folks should first learn to correctly use "its" vs. "it's" and "lets" vs. "let's". Then graduate to correct past-perfect usage :-)


It’s is not, it isn’t ain’t, and it’s it’s, not its, if you mean it is. If you don’t, it’s its. Then too, it’s hers. It isn’t her’s. It isn’t our’s either. It’s ours, and likewise yours and theirs.

—Oxford University Press, Edpress News


Mnemonic: his, hers, its. Works for me when memory and other methods do not.


I always remember that it's "it's" because the apostrophe replaces the missing letters in contractions. E.g.

  it is => it's
The possessive of 'it' doesn't need to replace any letters, so it's just "its."


If you are using this as a mnemonic about its/it's, this is fine; but your statement of it is as a rule ("the apostrophe replaces the missing letters in contractions") is misleadingly incomplete. Apostrophes do that, but they also serve as a possessive marker in the general case ("Sam's"), which is of course why its/it's causes so much trouble in the first place.


  | If you are using this as a mnemonic about its/it's,
  | this is fine; but your statement of it is as a rule
No need to be so pedantic. We're not teaching an English course here, we're talking about mnemonics for its/it's.

If you need a general rule, how about:

  when a possessive and a contraction collide,
  the contraction wins


No need; I think the original formulation---as a mnemonic---is just fine. We don't really need a "general rule" here anyway, and to be honest, English orthography and "general rules" don't really go well together.

The only reason I posted at all is because linguistics is an area where a lot of quite intelligent people hold some extremely unexamined (and incorrect) beliefs, and there is furthermore a common tendency to propagate those beliefs as if they were fact. As a result, whenever I see someone articulating anything that is formulated like a general rule about a language (or about language in general), I try to make corrections where I can.


You took "the apostrophe replaces the missing letters in contractions" as me postulating that all apostrophes are only used for contractions, which is jumping to conclusions, IMO.

Even if that statement is taken as a general rule, I can't think of any contractions that don't use an apostrophe, and it certainly doesn't state that contractions are the only place where apostrophes are used.


No need to be pedantic?!? This is the Internet here.


And I thought that Portuguese could be ambiguous....

I don't think these.sorts of phrases are valid in Portuguese.


Can you give an example of its ambiguity? I speak Portuguese but nothing ambiguous is coming to mind at this particular moment.


In my Portuguese class we were given this ambiguous phrase as a riddle:

"Maria toma banho porque sua mãe disse ela traga a toalha."

We had to make those words make sense only by adding punctuation and without changing the order of any words. Can you figure it out?

For those who don't speak Portuguese, the phrase above translates to: "Maria takes a bath because her mom said to her bring the towel." Doesn't make much sense!

The trick is that "sua," which means "her" when the following noun is feminine, is also the present third-person singular of the verb "suar," meaning "to sweat." Thus, with a few commas and quotation marks, it suddenly makes sense:

Maria toma banho porque sua. "Mãe," disse ela, "traga a toalha."

=

Maria takes a bath because she sweats. "Mom," she said, "bring the towel."

Not nearly as ambiguous as the "had had had had" example, but a similar lesson regarding the need for punctuation.

Edit: as personlurking pointed out, suar is "to sweat," not (as I put originally) "to smell." Thanks for catching that!


Oh, yes, I remember that one. Portuguese is not my first language, but I am fluent. I've been given that phrase before (and failed on the suar part despite knowing that verb). Tricky, indeed. Just a small correction, suar is to sweat.

I recall a protest sign that said "Veta Dilma!", or "Veto Dilma!" (the President of Brazil), but the protester meant to put a comma in there, as in "Veto, Dilma!" because the protest was about a bill running through congress.

Another one was "Mesmo sujo, governo quer rio Pinheiros sem cheiro" (Even though it's dirty, the government wants the Pinheiros river to be rid of the bad smell). The problem is the wording which makes it seem like the government is dirty, and not the river. Better would have been "Governo quer rio Pinheiros sem mau cheiro, mesmo que sujo" (The government wants the Pinheiros river without the bad smell, even though it's dirty.).


Functional languages provide the ability to (promote?) write expressions that remind me of this sentence.

Just because you can do without temporaries does not mean you should.


How so? The problem comes from having a highly context sensitive grammar, which is hardly something I associate specifically with functional languages; C++ is the usual language that people mock for having an all but Turing complete grammar. I guess the other obvious candidate is Lisp, but that's a different beast all together.


I agree with you that the context sensitive grammar make this sentence hard to parse. However if you look at the clarifications of meaning they break things into bits.

In some languages (e.g.: Java without lambdas), you can't write a function without giving it a name. You have to break things into bits and give them names.

In the functional languages you can just create a lambda and use it.

In a functional style conditionals return values you can use directly. In languages like Java you end up having to assign to temporaries in the branches of the if.


Notice my name?

Some clever bugger managed to get ghoti

and ghoughpteighbteau doesn't fit.

Fun fact: I do not know how to spell ghoughpteighbteau, but I can type it.


But isn't the "while John had had 'had'" segment a dangling modifier and hence, this sentence is wrong?

http://en.wikipedia.org/wiki/Dangling_modifier


Buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo.

http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo...


I can't resist mentioning the Lion-Eating Poet in the Stone Den poem I came across recently. The poet plays with the many tones and many variants of the sound 'sh' in Chinese:

  « Shī Shì shí shī shǐ »
  Shíshì shīshì Shī Shì, shì shī, shì shí shí shī.
  Shì shíshí shì shì shì shī.
  Shí shí, shì shí shī shì shì.
  Shì shí, shì Shī Shì shì shì.
  Shì shì shì shí shī, shì shǐ shì, shǐ shì shí shī shìshì.
  Shì shí shì shí shī shī, shì shíshì.
  Shíshì shī, Shì shǐ shì shì shíshì.
  Shíshì shì, Shì shǐ shì shí shì shí shī.
  Shí shí, shǐ shí shì shí shī shī, shí shí shí shī shī.
  Shì shì shì shì.
Translation:

  « Lion-Eating Poet in the Stone Den »
  In a stone den was a poet called Shi, who was a lion addict, and had resolved to eat ten lions.
  He often went to the market to look for lions.
  At ten o'clock, ten lions had just arrived at the market.
  At that time, Shi had just arrived at the market.
  He saw those ten lions, and using his trusty arrows, caused the ten lions to die.
  He brought the corpses of the ten lions to the stone den.
  The stone den was damp. He asked his servants to wipe it.
  After the stone den was wiped, he tried to eat those ten lions.
  When he ate, he realized that these ten lions were in fact ten stone lion corpses.
  Try to explain this matter.
http://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_D...


Note that this poem isn't in any "real" Chinese language. Basically, it's written in Classical Chinese, but pronounced as if it were Mandarin.

Old Chinese (the spoken language on which the written Classical form was based) had a much more complex phonology, which was simplified in Mandarin, resulting in many Old Chinese words becoming homophones or near homophones. To reduce the ambiguity, Mandarin uses many more compound words than Classical Chinese.

For example, a poet is called shīrén in Mandarin (literally, "poet person") while this poem just uses shī.


Chinese... you're drunk go home.

(On a more serious note that's pretty cool.)


When written in chinese, is there any more indication that it's not just the same word forty or so times than in the romanised version?


Check the wiki link, it has the characters instead of the pinyin. To answer your question yes the characters are very different.


The singularity is coming: the day google translate can correctly make that translation.


Didn't someone show that every single sequence of words buffalo* (buffalo, buffalo buffalo, buffalo buffalo buffalo, ...) is valid?


Here's how it works:

"buffalo" can function as a verb (bully) and a plural noun (bison).

So we can write "Bison bully bison." as "Buffalo buffalo buffalo."

I'm going to tag my groups of bison with numbers.

> Bison(1) bully bison(0).

> Buffalo(1) buffalo buffalo(0). (3 words)

Now, we can qualify bison(1), by stating that they are bullied by another group of bison, which I'll call bison(2). We can rewrite "who are bullied by bison(2)" as "bison(2) bully" (similar to rewriting "food that is liked by me" as "food I like") so we get:

> Bison(1) who are bullied by bison(2) bully, bison(0).

> Buffalo(1) buffalo(2) buffalo buffalo buffalo(0). (5 words)

We can modify bison(2) in the same way we modified bison(1), by stating that they are bullied by yet another group of bison, called bison(3):

> Bison(1) bison(2) bison(3) bully bully bully bison(0).

> Buffalo(1) buffalo(2) buffalo(3) buffalo buffalo buffalo buffalo(0). (7 words)

And so on...

> Buffalo(1) buffalo(2) buffalo(3) buffalo(4) buffalo buffalo buffalo buffalo buffalo(0). (9 words)

> Buffalo(1) buffalo(2) buffalo(3) buffalo(4) buffalo(5) buffalo buffalo buffalo buffalo buffalo buffalo(0). (11 words)

So that gives us sentences of lengths 3, 5, 7, 9, 11, ... words. We can add a word to each sentence by just adding the qualifier "Buffalo" (meaning "from the city of Buffalo") before buffalo(0). This gives us sentences of lengths 4, 6, 8, ...:

> Bison(1) bully bison(0) from the city of Buffalo.

> Buffalo(1) buffalo Buffalo buffalo(0). (4 words)

> Buffalo(1) buffalo(2) buffalo buffalo Buffalo buffalo(0). (6 words)

> Buffalo(1) buffalo(2) buffalo(3) buffalo buffalo buffalo Buffalo buffalo(0). (8 words)

So now we just need the sentences of lengths 1 and 2, which we can do by using the imperative. I.e. we give the general instruction "Bully." or the more specific instruction "Bully bison.":

> Buffalo. (1 word)

> Buffalo buffalo. (2 words)

The issue remains: can we call a sentence "valid" if it is entirely incomprehensible?


The sentence is nonsensical, because it refers to the same group three times - Bison from Buffalo. If bison from Buffalo bully other bison from Buffalo, it's tautological to again repeat the point that they bully bison from Buffalo.

You'd need more specific demographics to make the sentence valid, methinks, and when that happens, you're adding in a new word.


You're making the classic nerd error of assuming that natural languages should be as precisely defined as mathematics or computer languages. You're going to tell me that there's no such thing as deceleration next!

(To be fair, "Buffalo buffalo buffalo buffalo buffalo." isn't exactly a "natural" sentence.)

And in any case, the sentence is still syntactically valid, which is all I'm really concerned about here.


I'm not operating from logic, but from a 'natural' perspective. I've no problems with the syntax ("French citizens German citizens bully, bully Belgian citizens") though it's quite an awkward sentence in that format and any editor would require a rewrite, but it's just not something that someone would say if all the demographics were the same ("French citizens French citizens bully, bully French citizens") - ie, it's nonsense for that reason.


While the sentence is technically ambiguous, we say ambiguous things all the time and still manage to understand one another. This is the domain of the study of Pragmatics.

In natural communication, we START by assuming that the speaker had a meaningful intention, and then attempt to infer that meaning from context and assumptions about shared knowledge. I can easily think of contexts in which your "French citizens" sentence make sense.


I don't have the source for this, but I'm almost positive you are correct. I believe I remember an inductive proof that demonstrated this.


http://www.ci.buffalo.mn.us/Admin/Citycode/1004.htm

> Subd. 14. C. Notwithstanding the language contained in Subdivision 6 of this section, all “potentially dangerous animals,” as defined in this Ordinance, which are outside of the owner’s residence, must be kept on a suitable leash, or in an enclosure which restricts the animal’s ability to egress from the owner’s property.

Buffalo buffalo are not buffaloed by fences, but it might buffalo a Buffalo buffalo to see another Buffalo buffalo on a leash.


Based on a simple grammar I wrote in Python there are 21 valid interpretations of this sentence: https://github.com/taliesinb/chartreuse




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: