XFS File-System with Linux 5.10 Punts Year 2038 Problem to the Year 2486

tytso · on Oct 18, 2020

The reason why ext4 and xfs both use nanosecond resolution is because in the kernel the high precision time keeping structure is the timespec structure (which originally was defined by POSIX). That uses tv_sec and tv_nsec. Certainly in 2008, when ext4 was declared "stable", the hardware of the time was nowhere near having the necessary resolution to give us nanosecond accuract. However, that's not really the point. We want to be able to store an arbitrary timespec value, encode it in the file system timestamp, and then decode it back to a bit-identical timespec value. So that's why it makes sense to use at least a nanosecond granularity.

Why not use a finer granularity? Because space in the on-disk inode structure is precious. We need 30 bits to encode nanoseconds. That leaves an extra two bits that can be added to 32 bit "time in seconds since the Unix epoch". For full backwards compatibility, where a "negative" tv_sec corresponds to times before 1970, that gets you to the 25th century. If we really cared, we could add an extra 500 years by stealing a bit somewhere from the inode (maybe an unused flag bit, perhaps --- but since there are 4 timestamps in an inode, you would need to steal 4 bits for each doubling of time range). However, there is no guarantee that ext4 or xfs will be used 400-500 years from now; and if it is being used, it seems likely that there will plenty of time to do another format bump; XFS has had 4 incompatible fomat bumps in the last 27 years. ext2/ext3/ext4 has been around for 28 years, and depending on how you count, there has been 2-4 major version bumps (we use finer-grained feature bits, so it's a bit hard to count). In the next 500 years, we'll probably have a few more. :-)

throwaway_pdp09 · on Oct 18, 2020

I'm afraid I don't get this at all. Use of data should define the data...

> The reason why ext4 and xfs both use nanosecond resolution is because in the kernel the high precision time keeping structure is the timespec structure

...so resolution here is defined by what's provided, not what's (decided to be) useful. Is ns resolution useful is the important question.

> Why not use a finer granularity? Because space in the on-disk inode structure is precious

That's not a good reason AFAICS. What would it gain your users if you did? 1 ns = ~4 machine cycles. Timestamping to that res, well, what's the value to any application? I'm missing something.

emteycz · on Oct 18, 2020

You can sometimes see timestamps from other machines too, they can also be parallel

throwaway_pdp09 · on Oct 18, 2020

I'm being dense, I don't understand what you're saying. Could you give a bit more detail please?

emteycz · on Oct 19, 2020

What my machine can or can not do is irrelevant, I might connect a drive from a machine where machine cycles were faster or parallel. So it's entirely possible to see timestamps less than X nanoseconds apart even if my machine can't do more than one cycle each X nanoseconds.

throwaway_pdp09 · on Oct 19, 2020

How would that make any possible difference to you?

Bear in mind 1ns = the time a ray of light would travel 30cm in a vacuum.

emteycz · on Oct 19, 2020

I imagine it could make a difference in court in some cases.

throwaway_pdp09 · on Oct 19, 2020

With respect, no way. FIPS mandates a timestamp resolution of 10 microseconds. 10,100 times smaller that is literally meaningless.

emteycz · on Oct 20, 2020

Thanks for correcting me on that! I still think it could be useful to people, though. Do you think it's that wasteful? The machines of today are powerful and have plenty of disk space.

throwaway_pdp09 · on Oct 20, 2020

That's a good question and it's philosophical. Perhaps it may/may not be wasteful, but it's meaningless therefore useless. If they said "we'll store it to the nanosecond because the kernel does" I'm OK with that if - very big if - they make it clear that it does not have accuracy in the last few digits. They need to say clearly what max accuracy you can expect. To then talk about storing it with finer accuracy after that... just what?

Also be careful about thinking machines having big everything. Caches aren't huge, storing an extra byte if a billion places can add up at scale. Nothing comes free, don't scrimp where you don't have to but neither assume anything's free.

HTH. IMO only.

freedomben · on Oct 17, 2020

It's tempting to think that 2038 is a long way off, so who cares, but I still regularly encounter systems that have been in prod for 20+ years. 2038 is only 18 years away! If you're building systems today, it's worth thinking about how these kinds of issues may affect things. You may still be around then anyhow, so your future self may thank you.

onion2k · on Oct 17, 2020

It's important to remember that 2038 problems won't start happening in 2038. Any system where users can perform actions to generate arbitrary dates in the future that get converted to epoch timestamps in code needs to work with 2038 now.

sargun · on Oct 17, 2020

Yep. I broke a system at work this way. I needed a piece of crytopgraphic information generated by an internal system for testing. The system requires all pieces are signed (with an accompanying expiration date). I decided to ask for it to be signed for 100 years.

...This resulted in the service generating an invalid x509 certificate [but fortunately the validation library said that the cert was invalid, just didn't tell me why] -- making me lose a day to debugging this mistake.

ncmncm · on Oct 18, 2020

The real mistake was wanting a certificate life longer than the expected time to crack it.

danielheath · on Oct 18, 2020

For test systems? Unless you are testing key rotation, who cares?

ncmncm · on Oct 18, 2020

Point taken. The certificate generator program should have reported the error. Everything connected with x.509 barely works.

Somebody should standardize an x.509.good_parts subset that leaves out all the crap nobody needs and doesn't work anyway in deployed systems.

beambot · on Oct 18, 2020

For systems that are infrequently (or sometimes, never) updated, you might want to sign firmware images for as long as possible to ensure the devices can (a) support signed updates from the OEM; and (b) can work indefinitely even if the OEM disappears.

ThePadawan · on Oct 18, 2020

I used to work at an online shop that among other things, sold RAM.

Usually, any product sold comes with 2- or 5-year warranty if your lucky.

Some RAM manufacturers were so proud of their manufacturing quality, they instead offered 50-year warranties.

Now, the reasonable thing to do as a seller would have been to just limit that to something like 5 years, and then give the customer their money back if they had any issues after that.

However, someone actually programmed the system to use 50 years as the warranty duration, way in the 2040s.

I didn't look further into the technical issues behind that (probably not many, since the required precision wasn't seconds, just days), but that remains a great example of when not building a feature costs less than building it.

kevin_thibedeau · on Oct 18, 2020

The NTP epoch rolls over in 2036. My guess is that a lot of embedded devices are going to fail without warning.

axiolite · on Oct 18, 2020

I started hitting issues quite some time ago. Such as generating long-term self-signed certificates for internal services.

  openssl x509 -req -days 7000 -in site.csr -signkey site.key -out site.crt

Lots of software on 32-bit Linux systems will choke on it when the experation date crosses 2038, albeit not major web browser as they saw the issue early on and addressed it.

TedDoesntTalk · on Oct 18, 2020

I routinely have nightmares about this for many of the systems I code for now.

ncmncm · on Oct 18, 2020

Start coding wrap checking that replaces a negative number with the corresponding correct time.

In another 78 years they can make it a little smarter.

sedatk · on Oct 17, 2020

It's closer than 2000.

xiconfjs · on Oct 18, 2020

crazy, right?

stabbles · on Oct 17, 2020

I wouldn't be surprised if CentOS 8 was still around in 2038 running a 4.x linux kernel

cesarb · on Oct 18, 2020

The end-of-life for CentOS 8, according to https://www.centos.org/download/, is going to be 2029-05-31; however, RHEL tends to add a few extra years of extended support beyond that. According to Wikipedia, the longest extended support so far was around five years, which would get us to 2034. And people often use operating systems many years beyond their end-of-life.

Keeping in mind that XFS is a filesystem format, it's not hard at all to imagine a filesystem created in CentOS 8 and/or RHEL 8 still being in use when 2038 arrives, even if the operating system was already upgraded to the next major version.

chrisseaton · on Oct 17, 2020

> It's tempting to think that 2038 is a long way off, so who cares

It's not - it'll be here before you know it.

est31 · on Oct 18, 2020

Yeah this stuff still happens. Just recently in 2019, the GPS clock rolled over. The cellphone I had at the time (it was a 2011 build) couldn't deal with it and displayed a wrong GPS date, thankfully it was not used to prime the internal clock and was limited to the GPS diagnosis app. But still interesting. In the new protocol they only added 3 more bits, creating 157 year long epochs. I'd argue that this epoch is more dangerous than the currently deployed one because with the currently deployed one it's more likely that manufacturers build ability to cope with a rollover into their devices. It's also shorter than the lifetime of the united states...

burtonator · on Oct 18, 2020

That's not the issue. The issue is that the devs who wrote this will be retired by then ;)

snvzz · on Oct 18, 2020

2037 will be a great year to switch jobs.

lowbloodsugar · on Oct 17, 2020

Its tempting to think 2486 is a long way off. You may still be around then anyhow, so your future self may thank you.

kortex · on Oct 18, 2020

So what's the next step? Elsewhere in this thread, picoseconds were mentioned and that seems plausibly useful, so that's 10 bits. And it would be nice to avoid "year xyz" problems, so maybe we could have a coding region of say 2 bits to encode 4 overlapping epochs to allow rolling updates. That seems to imply at least a 10-byte or 12-byte time.

I don't see any way in which that could easily fit with POSIX time types, unless you keep the existing 32-bit time_t and tv_nsecs fields and tack on two 16-bit fields for extending precision in either direction. But ISO C allows time_t to be a floating point number, which opens a few doors.

What about UTF-8 style variable length times? Would that be too messy?

Edit: looks like most 64-bit OSes are already or are switching to 64-bit time_t. So that solves half the problem, but no picoseconds just yet. I guess that's what int64 or float64/80 is for.

a1369209993 · on Oct 18, 2020

FWIW, a duration of:

  15Gyr c * (c ln(2))^2 / G / (1MW/c2)

(the maximum lifetime of a Kardashev-0 (1MW) civilization with a power suppy as large as will fit into the observable universe without collapsing into a black hole) is about 2^212 seconds, so a 256-bit time value with quarter-nanosecond (exactly 2^-32 seconds) precision would be sufficient for any plausible application involving unmodified biological humans under known physics.

WJW · on Oct 18, 2020

Wouldn't the observable universe have expanded quite a bit by then?

a1369209993 · on Oct 18, 2020

Yes, but it won't have gained mass because of the cosmological event horizon resulting from the expansion of the universe. (That is, anything that's not in the observable universe already is moving away with a effective speed due to Hubble expansion greater than the speed of light, and thus can't[0] be retrieved for use as fuel.)

0: without violating the known laws of physics

bratao · on Oct 18, 2020

I´m impressed by the amount of new features that has been coming out XFS. The benchmarks (https://www.phoronix.com/scan.php?page=article&item=linux-58...) also confirms that is (one of) the best a mature FS around.

bmn__ · on Oct 18, 2020

Also scores the least amount of defects in this crash consistency/filesystem correctness comparison: https://danluu.com/file-consistency/

ehsankia · on Oct 18, 2020

Does benchmark performance really prove maturity?

bloak · on Oct 17, 2020

There's a better explanation here, though I still don't understand "34-bit unsigned second counter right shifted two bits" and "(((2^34-1) + (2^31-1)) & ~3)":

https://lwn.net/Articles/829314/

StringyBob · on Oct 17, 2020

A bit mysterious, but seems as files will now have nanosecond accuracy since 1901 (in the 64bit unsigned int), the quota timers are still 32bit and have been downgraded from 1 second to 4 second accuracy to cover the full bigtime forward time range required (for 1970 unix epoch).

Seems to be described in this patch comment: https://patchwork.kernel.org/project/xfs/patch/157784114490....

dschuler · on Oct 17, 2020

It’s not totally clear to me either, but maybe it’s a 32-bit value treated as a 34-bit value with a precision of 4 seconds (so left-shifted two bits elsewhere), since the whole term has the least significant two bits cleared with ‘& ~3’. No idea what the (2^31-1) term is doing though.

jart · on Oct 17, 2020

This is what happens when your data structures declare year zero like pol pot. Unfortunately if we wanted to be scientific we'd probably need something like 3000 bit timestamps to do planck time until heat death of the universe. XFS could have lasted the better part of the stelliferous era if they had chosen struct timespec, but even that has the problematic nanosecond granularity limitation. It's not the stone age and seconds have a very clear definition these days in terms of the number of hyperfine transitions of caesium, so I would have hoped new data structures would at least choose that.

dmurray · on Oct 17, 2020

For most applications, 500 years and nanosecond granularity is not "problematic". The handful that need greater range or precision (or unlikely, both) can roll their own data structures or use a specialised one rather than needing to cater for that in the default.

You can always use floating point representations. It will give you great range and great precision, though not both at the same time (you can't refer to a specific picosecond on a specific day 100,000 years from now).

Dylan16807 · on Oct 18, 2020

Nanosecond timestamps are fine for their purpose. Just measuring something from an observation point a few inches different will shift your numbers by a nanosecond. If you need ultra-precise records from some kind of equipment then it needs to use a specialized local clock anyway, and it's probably better if it's hard to convert those records back and forth with normal timestamps.

> This is what happens when your data structures declare year zero like pol pot.

Picking an arbitrary zero point is the only way to make timestamps work. If we started counting with the big bang then all our clocks would be plus or minus millions of years.

zelly · on Oct 18, 2020

3000 bits for every inode. Academic excellence.

mhh__ · on Oct 17, 2020

> planck time

We can't even get close to a planck time tick so that's not really necessary

withinboredom · on Oct 17, 2020

> We can't even get close to a planck time

IIRC, Windows can't get a higher resolution than time % 10ms, should we not allow getting the time at millisecond resolution? Just because it's not necessary, doesn't mean it isn't useful. In this case, I can't really see it being useful either... would love to hear a non-niche use-case for it.

mhh__ · on Oct 17, 2020

I heard a Keysight engineer saying they have NDA-ed tech to achieve picosecond synchronization over the network, so there is a case for precise ticks but the planck time is about 20 orders of magnitude shorter than cutting edge research that is a one shot experiment in a laboratory.

jart · on Oct 17, 2020

Windows NT measures time at hectonanosecond (100ns) granularity and declares its epoch as modernity (~1600) so it does better than UNIX but unfortunately still limited by arbitrary traditional thinking.

dundarious · on Oct 17, 2020

I think the point is that the entire human race is not able to get close to a planck time, by any reasonable definition of "get". See this recent discussion: https://news.ycombinator.com/item?id=24804624

trhway · on Oct 17, 2020

one way for a generational ship to fail.

Red_Leaves_Flyy · on Oct 17, 2020

While the failure mode is different. The Orville did a great episode(if the stars should appear s01e04) on what could happen in such an event.

https://m.imdb.com/title/tt6483046/

n00ri · on Oct 17, 2020

interresting

ncmncm · on Oct 18, 2020

Just interpreting time as unsigned would take them to 2106. Probably there would be other filesystems to use before then.

There are exactly zero XFS filesystem files created before 1970, so there is no need to represent those times in a filesystem.

I seriously wonder what makes this basic fact so hard for so many people to process. If you have a clue, please do explain.

quotemstr · on Oct 18, 2020

> There are exactly zero XFS filesystem files created before 1970

But you can set the timestamp of a file to a time before the epoch.

  $ touch --date='Jun 1 1952' foo
  $ ls -l foo
  -rw------- 1 me me 0 Jun  1  1952 foo

Silently changing the interpretation of these pre-epoch timestamps would break users and so isn't an option.

ncmncm · on Oct 18, 2020

Obviously you can, but why should anyone care that you did? If you only care about the bit pattern (which must be the case, because the date is a lie), what does it matter how anybody interprets it?

quotemstr · on Oct 18, 2020

The bit pattern as returned from stat would be different: that's what makes it an ABI break. If I make a file with an mtime of 1952 and then I update the kernel and that file has an mtime of 2077 or something, that's the kernel violating the contract it has with userspace, and Linux does not break kernel ABI.

It doesn't matter whether you think that such timestamps are "lie[s]": the contract is the contract. You have no idea how someone might be using the system, and you don't get to break uses that are legal within the contract but contrary to your sense of good taste.

Dylan16807 · on Oct 18, 2020

> Linux does not break kernel ABI

Yes it does. The rule isn't absolute. Impact matters, not your ability to find a single bit pattern that differs.

Someone · on Oct 18, 2020

The date need not be a lie. One might have moved file systems a few times, migrating files each time, and preserving time stamps.

Alternatively, if you find an early tape in a yard sale, you’ll want to keep creation dates of the files when you read in the data.

Those are ‘somewhat’ of an edge case, though.

And yes, it seems at least some file systems from before 1970 had time stamps. https://en.wikipedia.org/wiki/Comparison_of_file_systems#Met... says DECTape had, and that’s from 1963.

rvr_ · on Oct 18, 2020

I really wish people (and redhat) just stop using XFS altogether. I have seen it break multiple times with two different scenarios: 1) Backup system with external media (usb3) on XFS. The USB link goes down, the kernel panics inside some XFS code. Hard reset needed. 2) Production system with <1M files being actively read by apache http. Hot updating this folder via rsync randomly panics the system.

After a couple of times you just migrate to something sane, like ext4, and everything works flawlessly.

ink_13 · on Oct 18, 2020

> migrate to something sane, like ext4

I have seen this argument used the other way. XFS never runs out of inodes, for example.

nix23 · on Oct 18, 2020

You have much bigger problems and none of it has something todo with XFS.

otterley · on Oct 18, 2020

If you can reproduce this, have you reached out to the maintainers? They’d love to fix bugs like these.

zelly · on Oct 18, 2020

They should use ZFS or really fix btrfs. These barebones filesystems have really run their course.

snvzz · on Oct 18, 2020

Don't forget about NILFS2. Designed for low-latency. log-like structure. Data and metadata checksums. Been in the kernel for several years.

danschumann · on Oct 17, 2020

Okay, if that's a punt, the ball has left the stadium, left the parking lot, narrowly avoids hitting skyscrapers downtown, on it's way to landing safely in a park n ride outside of town, where it bounces a few times before coming to a stop, but not before it taps into a car, setting off an alarm, which happens to be a fans car who bussed in, trying to save on traffic, but will be woefully disappointed to find his car dead, only to have his spirits lifted when he finds the game ball, only slightly charred from breaking the sound barrier, and a jump from his crush who went to the game with him, and then she helped him with his dead battery, heyo!