I watched the SNIA presentation from SDC2020 on EFS and it described each extent in the file system as a state machine replicated via multi-paxos.
It seems possible to implement this feature via a time-based leader lease on the extent where a read request goes to a read-through cache. The cache will store some metadata (e.g. a hash or a verison number) of the block it is trying to validate from cache and send that to the leader along with the read request.
If the leader has the same version number for a block as was requested by the cache, the block itself does not need to be transferred and the other replicas don't need to be contacted. If you place the cache in the same AZ as the EC2 instance reading the file system, 600ms sounds viable.
The caches are local to each AZ so you get the low latency in each AZ, the other details are different. Unfortunately I can't share additional details at this moment, but we are looking to do a technical update on EFS at some point soon, maybe at a similar venue!
It seems possible to implement this feature via a time-based leader lease on the extent where a read request goes to a read-through cache. The cache will store some metadata (e.g. a hash or a verison number) of the block it is trying to validate from cache and send that to the leader along with the read request.
If the leader has the same version number for a block as was requested by the cache, the block itself does not need to be transferred and the other replicas don't need to be contacted. If you place the cache in the same AZ as the EC2 instance reading the file system, 600ms sounds viable.
Am I on the right track? :)