Vitastor S3 prototype based on Zenko CloudServer

2024-08-11

The prototype of S3 Vitastor backend for Zenko CloudServer is now running 😊

This is of course far from being a release, it lacks at least defragmentation. But you can already try to run it and write/read something to it via S3. It’s even usable with GeeseFS. 😊

So, here are the instructions!

1. Install the Zenko server itself and dependencies

Clone https://git.yourcmc.ru/vitalif/zenko-cloudserver-vitastor.
Run npm install, or npm install --omit dev to install dependencies - zenko-specific dependencies are either cut off or moved to my git, also I did a slight overall cleanup of dependency versions.
Clone Vitastor itself: git clone https://git.yourcmc.ru/vitalif/vitastor
Go to the node-binding folder, also run npm install to build the binding. You’ll need latest installed Vitastor header (vitastor-client-dev) for this to work, as well as the node.js native module builder (node-gyp).
Symlink the built module to zenko: ln -s /path/to/vitastor/node-binding /path/to/zenko/node_modules/vitastor

2. Install and configure MongoDB

I think you can handle it yourself. :-)

You can follow the MongoDB manual: https://www.mongodb.com/docs/manual/installation/

3. Set up the Vitastor backend

Create a separate pool for S3 data in Vitastor.
Create an image in another pool for storing metadata (“s3-volume-meta” to match the value in locationConfig.json): vitastor-cli create -s 10G s3-volume-meta
Copy config.json.vitastor to config.json, edit the public address as needed.
Copy authdata.json.example to authdata.json - it’s where you specify access & secret keys - edit them if necessary. Scality uses a separate closed-source authentication service, so we can only use a file-based stub for now.
Copy locationConfig.json.vitastor to locationConfig.json - it’s where you specify Vitastor connection data. Put the correct pool_id and metadata_image there.

4. Run it

Use the command node index.js.

And that’s basically it. You get a working S3 server on port 8000.

Bonus - pounding node_modules

In node.js applications, a typical situation is “girls were swimming in the lake, node_modules found” ©.

When installing Zenko CloudServer, for example, you get 23885 files in node_modules with a total size of 475 MB. Your sense of beauty suffers - what should you do?

The answer to this question is webpack. Yes, it can be used not only for client-side browser code, but also on the server!

How does it work? You need webpack.config.js - it’s already in the zenko-vitastor repository. So go to your directory with zenko cloudserver installed and run:

npm exec webpack --mode=development

It will produce a single 25 MB dist/zenko-vitastor.js file (or even 15MB if you run it with --mode=production), without any dependencies except native modules. You can find these modules using find node_modules -name '*.node'

node.js binary modules have the extension .node.

In case of zenko there are 7 such modules - they have to be copied from the original node_modules:

vitastor itself
bufferutil
diskusage
fcntl
ioctl
leveldown
utf-8-validate

Bonus number two - Zenko dependencies

In addition to “normal” npm dependencies, Zenko has some “private” ones.

The first portion of them were actually optional so I conditionally disabled them in code and removed them from package.json:

vaultclient - a client for a closed source password/key storage service “Scality Vault” (should not be confused with Hashicorp Vault). Note: it also has another library tied to it - utapi, so it also has to be untied from it.
bucketclient - a client for a closed-source service for storing S3 object metadata “Scality bucketd”.
hdclient - client for a closed-source S3 object data storage service “Scality Hyperdrive”.
sproxydclient - a client for what is apparently another closed-source S3 object data storage service, “Scality sproxyd”.

The second portion were isolated repositories which were just omitted:

backbeat and breakbeat - Lifecycle services, opensource, but at first glance the implementation is a bit strange (relies on reading mongo oplog through kafka connect) so it seems we don’t want it.
s3utils - mostly consists of things related to replication and closed-source Scality services (the same sproxyd, etc.).

The third portion contains necessary or mostly necessary libraries:

arsenal - really it’s a part of zenko cloudserver, just extracted into a separate library.
eslint-config-scality - just a eslint config.
node-fcntl - this is not actually fcntl at all, it’s posix_fadvise for node.js.
httpagent is Scality’s wrapper over the standard HTTP client.
scubaclient - a client for the bucket quota/counter service SCUBA (AQUALUNG?.. Scality Consumption Utilization and Billing API). Its stub is used to run tests so it can be left in place. Also who knows, maybe we’d also want a separate bucket counter service in the future…
utapi - utilization API, a library for collecting metrics - maybe it’s slightly overcomplicated and maybe a simple prometheus exporter could suffice, but Scality decided to do it this way.
werelogs - logging library/wrapper.

All of these libraries could be cut out too, and arsenal could be merged into the main cloudserver repository. But it’s left in place more or less as is for the ease of subsequent merges with upstream versions, which, apparently, are coded by authors in real time.

Also there’s Orbit, a web interface for Zenko, but it goes beyond the scope of this article.