Pattern - Shared Filestore
A shared filestore allows all running nodes to have access to communal data. This allows work to be distributed across machines, instead of relying on the hosts with the specific data to be available.
Features
- A network-visible filestore is provided (somehow)
- All running nodes have read (and usually write) access to this store
- Work can be performed by reading data from and writing data to the store
Advantages
- The failure of a worker node does not make data inaccessible.
- There's a single filestore to manage
- The filestore may also be visible to remote users, providing a way to upload and download data.
Disadvantages
- The filestore becomes a Single Point of Failure; unless it is a high availability service, failure of the filestore will take the entire system down.
- Locks may need to be distributed
- Security is more complex, as access to the filestore needs to be managed.
- The filestore has many scalability challenges, simply to cope with many reads/writes.
- While transparent access to data, wherever it is stored, seems an admirable goal, throughput on data-intensive applications is best when the computation takes place close to the data.
- Remote filestores are not local filesystems; code that assumes they are local may get confused. As an example, file timestamps may behave differently, case equality logic may be different; there may be other quirks.