Question about Composite Hash Keys that someone might have the answer to (or be able to relate to other known implementations):
The composite key has two attributes, a “hash attribute” and a “range attribute.” You can do a range query within records that have the same hash attribute.
It would obviously be untenable if they spread records with the same hash attribute across many servers. You'd have a scatter-gather issue. Range queries would need to query all servers and pop the min from each until it's done, and that significantly taxes network IO.
This implies that they try to keep the number of servers that host records for the same hash attribute to a minimum. Consequently, if you store too many documents with the same hash attribute, wouldn't you overburden that subset of servers, and thus see performance degradation?
Azure has similar functionality for their table service, requiring a partition key, and they explicitly say that there is a throughput limit for records with the same partition key. I haven't seen similar language from Amazon.
Whether you scatter-gather or try to cluster values to a small set of servers, you'll eventually degrade in performance. Does anyone have insight into Amazon's implementation?
The composite key has two attributes, a “hash attribute” and a “range attribute.” You can do a range query within records that have the same hash attribute.
It would obviously be untenable if they spread records with the same hash attribute across many servers. You'd have a scatter-gather issue. Range queries would need to query all servers and pop the min from each until it's done, and that significantly taxes network IO.
This implies that they try to keep the number of servers that host records for the same hash attribute to a minimum. Consequently, if you store too many documents with the same hash attribute, wouldn't you overburden that subset of servers, and thus see performance degradation?
Azure has similar functionality for their table service, requiring a partition key, and they explicitly say that there is a throughput limit for records with the same partition key. I haven't seen similar language from Amazon.
Whether you scatter-gather or try to cluster values to a small set of servers, you'll eventually degrade in performance. Does anyone have insight into Amazon's implementation?