First release of Vitastor S3

2025-03-16

The moment has come - Vitastor S3 implementation based on Zenko CloudServer is finally released.

Key differences from the prototype:

  • Volume defragmentation is implemented;
  • Volume metadata may now be stored in the same MongoDB as object metadata, not just in VitastorKV;
  • Tests for the Vitastor S3 backend added;
  • S3 is now packaged in a convenient Docker build.

Highlights

  • Zenko CloudServer is implemented in node.js.
  • Object metadata is stored in MongoDB.
  • Modified Zenko CloudServer version is used for Vitastor. It is slightly different from the original, has an optimised build and unneeded dependencies are stripped off.
  • Object data is stored in Vitastor block volumes, but the volume metadata is stored in the same MongoDB, not in Vitastor etcd.
  • Objects are written to volumes sequentially one after another. The space is allocated with rounding to the sector size (4 KB), so each object takes at least 4 KB.
  • An important property of such storage scheme is that small objects aren’t chunked into parts in Vitastor EC N+K pools and thus don’t require reads from all N disks when downloading.
  • Deleted objects are marked as deleted, but the space is only actually freed during asynchronously executed “defragmentation” process. Defragmentation runs automatically in the background when a volume reaches configured amount of “garbage” (20% by default). Defragmentation copies actual objects to new volume(s) and then removes the old volume. Defragmentation can be configured in locationConfig.json.

Installation

Follow the documentation: https://vitastor.io/en/docs/installation/s3.html

Plans for future development

  • User account storage in the DB instead of a static file. Original Zenko uses a separate closed-source “Scality Vault” service for it, that’s why we use a static file for now.
  • More detailed documentation.
  • Support for other (and faster) key-value DBMS for object metadata storage.
  • Other performance optimisations, for example, related to the used hash function - MD5 used for Amazon compatibility purposes is relatively slow.
  • Object Lifecycle support. There is a Lifecycle implementation for Zenko called Backbeat but it’s not adapted for Vitastor yet.
  • Quota support. Original Zenko uses a separate “SCUBA” service for quotas, but it’s also proprietary and not available publicly.

Initial benchmarks

Tests below were conducted on a very small test cluster with 4 hosts and 1x Samsung PM9A3 on each of them, 1 zenko instance with 8 node.js worker processes and MongoDB replica set with 3 replicas installed on system SSDs.

hsbench from localhost was used for the benchmark.

16 threads, 4 KB objects:

./hsbench -a accessKey1 -s verySecretKey1 -u http://localhost:8000 -z 4k -t 16
... Dur(s): 60.0, Mode: PUT, Ops: 40721, MB/s: 2.65, IO/s: 678, Lat(ms): [ min: 8.9, avg: 23.6, 99%: 46.1, max: 627.1 ], Slowdowns: 0
... Dur(s): 60.3, Mode: LIST, Ops: 3939, MB/s: 0.00, IO/s: 65, Lat(ms): [ min: 92.6, avg: 244.2, 99%: 608.9, max: 919.8 ], Slowdowns: 0
... Dur(s): 60.0, Mode: GET, Ops: 163326, MB/s: 10.63, IO/s: 2722, Lat(ms): [ min: 2.4, avg: 5.8, 99%: 16.7, max: 31.4 ], Slowdowns: 0
... Dur(s): 37.6, Mode: DEL, Ops: 40721, MB/s: 4.23, IO/s: 1084, Lat(ms): [ min: 7.3, avg: 14.8, 99%: 26.9, max: 57.5 ], Slowdowns: 0

16 threads, 4 MB objects:

... Dur(s): 60.1, Mode: PUT, Ops: 14879, MB/s: 990.77, IO/s: 248, Lat(ms): [ min: 22.2, avg: 64.5, 99%: 139.2, max: 641.0 ], Slowdowns: 0
... Dur(s): 60.4, Mode: LIST, Ops: 3943, MB/s: 0.00, IO/s: 65, Lat(ms): [ min: 104.2, avg: 244.1, 99%: 564.5, max: 966.1 ], Slowdowns: 0
... Dur(s): 60.2, Mode: GET, Ops: 31415, MB/s: 2087.69, IO/s: 522, Lat(ms): [ min: 5.9, avg: 28.4, 99%: 230.9, max: 682.7 ], Slowdowns: 0
... Dur(s): 14.0, Mode: DEL, Ops: 14879, MB/s: 4264.17, IO/s: 1066, Lat(ms): [ min: 7.5, avg: 15.0, 99%: 27.0, max: 49.1 ], Slowdowns: 0

1 node.js worker process, 4 hsbench threads, 4 MB objects:

... Dur(s): 60.0, Mode: PUT, Ops: 3699, MB/s: 246.45, IO/s: 62, Lat(ms): [ min: 35.3, avg: 64.9, 99%: 85.0, max: 192.0 ], Slowdowns: 0
... Dur(s): 60.3, Mode: LIST, Ops: 856, MB/s: 0.00, IO/s: 14, Lat(ms): [ min: 126.0, avg: 281.1, 99%: 430.6, max: 484.0 ], Slowdowns: 0
... Dur(s): 60.0, Mode: GET, Ops: 6399, MB/s: 426.43, IO/s: 107, Lat(ms): [ min: 5.8, avg: 35.7, 99%: 259.2, max: 289.3 ], Slowdowns: 0
... Dur(s): 10.9, Mode: DEL, Ops: 3699, MB/s: 1355.97, IO/s: 339, Lat(ms): [ min: 8.1, avg: 11.8, 99%: 18.2, max: 25.6 ], Slowdowns: 0

Conclusion:

  • node.js performance is totally OK. Linear write performance is ~250 MB/s per process which is on par with Minio (implemented in Go) with GOMAXPROCS=1 (1 thread) running on the same host and mostly bounded by MD5 and SHA1 hash calculation.
  • Small object performance seems to be bounded by MongoDB. So it seems that MongoDB isn’t fast enough and we need another Key-Value DB backend. And that’s what we’ll explore during next releases! :)

Author & License