The reason why ext4 and xfs both use nanosecond resolution is because in the kernel the high precision time keeping structure is the timespec structure (which originally was defined by POSIX). That uses tv_sec and tv_nsec. Certainly in 2008, when ext4 was declared "stable", the hardware of the time was nowhere near having the necessary resolution to give us nanosecond accuract. However, that's not really the point. We want to be able to store an arbitrary timespec value, encode it in the file system timestamp, and then decode it back to a bit-identical timespec value. So that's why it makes sense to use at least a nanosecond granularity.
Why not use a finer granularity? Because space in the on-disk inode structure is precious. We need 30 bits to encode nanoseconds. That leaves an extra two bits that can be added to 32 bit "time in seconds since the Unix epoch". For full backwards compatibility, where a "negative" tv_sec corresponds to times before 1970, that gets you to the 25th century. If we really cared, we could add an extra 500 years by stealing a bit somewhere from the inode (maybe an unused flag bit, perhaps --- but since there are 4 timestamps in an inode, you would need to steal 4 bits for each doubling of time range). However, there is no guarantee that ext4 or xfs will be used 400-500 years from now; and if it is being used, it seems likely that there will plenty of time to do another format bump; XFS has had 4 incompatible fomat bumps in the last 27 years. ext2/ext3/ext4 has been around for 28 years, and depending on how you count, there has been 2-4 major version bumps (we use finer-grained feature bits, so it's a bit hard to count). In the next 500 years, we'll probably have a few more. :-)
I'm afraid I don't get this at all. Use of data should define the data...
> The reason why ext4 and xfs both use nanosecond resolution is because in the kernel the high precision time keeping structure is the timespec structure
...so resolution here is defined by what's provided, not what's (decided to be) useful. Is ns resolution useful is the important question.
> Why not use a finer granularity? Because space in the on-disk inode structure is precious
That's not a good reason AFAICS. What would it gain your users if you did? 1 ns = ~4 machine cycles. Timestamping to that res, well, what's the value to any application? I'm missing something.
What my machine can or can not do is irrelevant, I might connect a drive from a machine where machine cycles were faster or parallel. So it's entirely possible to see timestamps less than X nanoseconds apart even if my machine can't do more than one cycle each X nanoseconds.
Thanks for correcting me on that! I still think it could be useful to people, though. Do you think it's that wasteful? The machines of today are powerful and have plenty of disk space.
That's a good question and it's philosophical. Perhaps it may/may not be wasteful, but it's meaningless therefore useless. If they said "we'll store it to the nanosecond because the kernel does" I'm OK with that if - very big if - they make it clear that it does not have accuracy in the last few digits. They need to say clearly what max accuracy you can expect. To then talk about storing it with finer accuracy after that... just what?
Also be careful about thinking machines having big everything. Caches aren't huge, storing an extra byte if a billion places can add up at scale. Nothing comes free, don't scrimp where you don't have to but neither assume anything's free.
It's tempting to think that 2038 is a long way off, so who cares, but I still regularly encounter systems that have been in prod for 20+ years. 2038 is only 18 years away! If you're building systems today, it's worth thinking about how these kinds of issues may affect things. You may still be around then anyhow, so your future self may thank you.
It's important to remember that 2038 problems won't start happening in 2038. Any system where users can perform actions to generate arbitrary dates in the future that get converted to epoch timestamps in code needs to work with 2038 now.
Yep. I broke a system at work this way. I needed a piece of crytopgraphic information generated by an internal system for testing. The system requires all pieces are signed (with an accompanying expiration date). I decided to ask for it to be signed for 100 years.
...This resulted in the service generating an invalid x509 certificate [but fortunately the validation library said that the cert was invalid, just didn't tell me why] -- making me lose a day to debugging this mistake.
For systems that are infrequently (or sometimes, never) updated, you might want to sign firmware images for as long as possible to ensure the devices can (a) support signed updates from the OEM; and (b) can work indefinitely even if the OEM disappears.
I used to work at an online shop that among other things, sold RAM.
Usually, any product sold comes with 2- or 5-year warranty if your lucky.
Some RAM manufacturers were so proud of their manufacturing quality, they instead offered 50-year warranties.
Now, the reasonable thing to do as a seller would have been to just limit that to something like 5 years, and then give the customer their money back if they had any issues after that.
However, someone actually programmed the system to use 50 years as the warranty duration, way in the 2040s.
I didn't look further into the technical issues behind that (probably not many, since the required precision wasn't seconds, just days), but that remains a great example of when not building a feature costs less than building it.
Lots of software on 32-bit Linux systems will choke on it when the experation date crosses 2038, albeit not major web browser as they saw the issue early on and addressed it.
The end-of-life for CentOS 8, according to https://www.centos.org/download/, is going to be 2029-05-31; however, RHEL tends to add a few extra years of extended support beyond that. According to Wikipedia, the longest extended support so far was around five years, which would get us to 2034. And people often use operating systems many years beyond their end-of-life.
Keeping in mind that XFS is a filesystem format, it's not hard at all to imagine a filesystem created in CentOS 8 and/or RHEL 8 still being in use when 2038 arrives, even if the operating system was already upgraded to the next major version.
Yeah this stuff still happens. Just recently in 2019, the GPS clock rolled over. The cellphone I had at the time (it was a 2011 build) couldn't deal with it and displayed a wrong GPS date, thankfully it was not used to prime the internal clock and was limited to the GPS diagnosis app. But still interesting. In the new protocol they only added 3 more bits, creating 157 year long epochs. I'd argue that this epoch is more dangerous than the currently deployed one because with the currently deployed one it's more likely that manufacturers build ability to cope with a rollover into their devices. It's also shorter than the lifetime of the united states...
So what's the next step? Elsewhere in this thread, picoseconds were mentioned and that seems plausibly useful, so that's 10 bits. And it would be nice to avoid "year xyz" problems, so maybe we could have a coding region of say 2 bits to encode 4 overlapping epochs to allow rolling updates. That seems to imply at least a 10-byte or 12-byte time.
I don't see any way in which that could easily fit with POSIX time types, unless you keep the existing 32-bit time_t and tv_nsecs fields and tack on two 16-bit fields for extending precision in either direction. But ISO C allows time_t to be a floating point number, which opens a few doors.
What about UTF-8 style variable length times? Would that be too messy?
Edit: looks like most 64-bit OSes are already or are switching to 64-bit time_t. So that solves half the problem, but no picoseconds just yet. I guess that's what int64 or float64/80 is for.
(the maximum lifetime of a Kardashev-0 (1MW) civilization with a power suppy as large as will fit into the observable universe without collapsing into a black hole) is about 2^212 seconds, so a 256-bit time value with quarter-nanosecond (exactly 2^-32 seconds) precision would be sufficient for any plausible application involving unmodified biological humans under known physics.
Yes, but it won't have gained mass because of the cosmological event horizon resulting from the expansion of the universe. (That is, anything that's not in the observable universe already is moving away with a effective speed due to Hubble expansion greater than the speed of light, and thus can't[0] be retrieved for use as fuel.)
There's a better explanation here, though I still don't understand "34-bit unsigned second counter right shifted two bits" and "(((2^34-1) + (2^31-1)) & ~3)":
A bit mysterious, but seems as files will now have nanosecond accuracy since 1901 (in the 64bit unsigned int), the quota timers are still 32bit and have been downgraded from 1 second to 4 second accuracy to cover the full bigtime forward time range required (for 1970 unix epoch).
It’s not totally clear to me either, but maybe it’s a 32-bit value treated as a 34-bit value with a precision of 4 seconds (so left-shifted two bits elsewhere), since the whole term has the least significant two bits cleared with ‘& ~3’. No idea what the (2^31-1) term is doing though.
This is what happens when your data structures declare year zero like pol pot. Unfortunately if we wanted to be scientific we'd probably need something like 3000 bit timestamps to do planck time until heat death of the universe. XFS could have lasted the better part of the stelliferous era if they had chosen struct timespec, but even that has the problematic nanosecond granularity limitation. It's not the stone age and seconds have a very clear definition these days in terms of the number of hyperfine transitions of caesium, so I would have hoped new data structures would at least choose that.
For most applications, 500 years and nanosecond granularity is not "problematic". The handful that need greater range or precision (or unlikely, both) can roll their own data structures or use a specialised one rather than needing to cater for that in the default.
You can always use floating point representations. It will give you great range and great precision, though not both at the same time (you can't refer to a specific picosecond on a specific day 100,000 years from now).
Nanosecond timestamps are fine for their purpose. Just measuring something from an observation point a few inches different will shift your numbers by a nanosecond. If you need ultra-precise records from some kind of equipment then it needs to use a specialized local clock anyway, and it's probably better if it's hard to convert those records back and forth with normal timestamps.
> This is what happens when your data structures declare year zero like pol pot.
Picking an arbitrary zero point is the only way to make timestamps work. If we started counting with the big bang then all our clocks would be plus or minus millions of years.
IIRC, Windows can't get a higher resolution than time % 10ms, should we not allow getting the time at millisecond resolution? Just because it's not necessary, doesn't mean it isn't useful. In this case, I can't really see it being useful either... would love to hear a non-niche use-case for it.
I heard a Keysight engineer saying they have NDA-ed tech to achieve picosecond synchronization over the network, so there is a case for precise ticks but the planck time is about 20 orders of magnitude shorter than cutting edge research that is a one shot experiment in a laboratory.
Windows NT measures time at hectonanosecond (100ns) granularity and declares its epoch as modernity (~1600) so it does better than UNIX but unfortunately still limited by arbitrary traditional thinking.
I think the point is that the entire human race is not able to get close to a planck time, by any reasonable definition of "get". See this recent discussion: https://news.ycombinator.com/item?id=24804624
Obviously you can, but why should anyone care that you did? If you only care about the bit pattern (which must be the case, because the date is a lie), what does it matter how anybody interprets it?
The bit pattern as returned from stat would be different: that's what makes it an ABI break. If I make a file with an mtime of 1952 and then I update the kernel and that file has an mtime of 2077 or something, that's the kernel violating the contract it has with userspace, and Linux does not break kernel ABI.
It doesn't matter whether you think that such timestamps are "lie[s]": the contract is the contract. You have no idea how someone might be using the system, and you don't get to break uses that are legal within the contract but contrary to your sense of good taste.
I really wish people (and redhat) just stop using XFS altogether. I have seen it break multiple times with two different scenarios:
1) Backup system with external media (usb3) on XFS. The USB link goes down, the kernel panics inside some XFS code. Hard reset needed.
2) Production system with <1M files being actively read by apache http. Hot updating this folder via rsync randomly panics the system.
After a couple of times you just migrate to something sane, like ext4, and everything works flawlessly.
Okay, if that's a punt, the ball has left the stadium, left the parking lot, narrowly avoids hitting skyscrapers downtown, on it's way to landing safely in a park n ride outside of town, where it bounces a few times before coming to a stop, but not before it taps into a car, setting off an alarm, which happens to be a fans car who bussed in, trying to save on traffic, but will be woefully disappointed to find his car dead, only to have his spirits lifted when he finds the game ball, only slightly charred from breaking the sound barrier, and a jump from his crush who went to the game with him, and then she helped him with his dead battery, heyo!
Why not use a finer granularity? Because space in the on-disk inode structure is precious. We need 30 bits to encode nanoseconds. That leaves an extra two bits that can be added to 32 bit "time in seconds since the Unix epoch". For full backwards compatibility, where a "negative" tv_sec corresponds to times before 1970, that gets you to the 25th century. If we really cared, we could add an extra 500 years by stealing a bit somewhere from the inode (maybe an unused flag bit, perhaps --- but since there are 4 timestamps in an inode, you would need to steal 4 bits for each doubling of time range). However, there is no guarantee that ext4 or xfs will be used 400-500 years from now; and if it is being used, it seems likely that there will plenty of time to do another format bump; XFS has had 4 incompatible fomat bumps in the last 27 years. ext2/ext3/ext4 has been around for 28 years, and depending on how you count, there has been 2-4 major version bumps (we use finer-grained feature bits, so it's a bit hard to count). In the next 500 years, we'll probably have a few more. :-)