Vitastor S3 prototype based on Zenko CloudServer
2024-08-11
The prototype of S3 Vitastor backend for Zenko CloudServer is now running 😊
This is of course far from being a release, it lacks at least defragmentation. But you can already try to run it and write/read something to it via S3. It’s even usable with GeeseFS. 😊
So, here are the instructions!
1. Install the Zenko server itself and dependencies
- Clone https://git.yourcmc.ru/vitalif/zenko-cloudserver-vitastor.
- Run
npm install
, ornpm install --omit dev
to install dependencies - zenko-specific dependencies are either cut off or moved to my git, also I did a slight overall cleanup of dependency versions. - Clone Vitastor itself:
git clone https://git.yourcmc.ru/vitalif/vitastor
- Go to the
node-binding
folder, also runnpm install
to build the binding. You’ll need latest installed Vitastor header (vitastor-client-dev
) for this to work, as well as thenode.js
native module builder (node-gyp
). - Symlink the built module to zenko:
ln -s /path/to/vitastor/node-binding /path/to/zenko/node_modules/vitastor
2. Install and configure MongoDB
I think you can handle it yourself. :-)
You can follow the MongoDB manual: https://www.mongodb.com/docs/manual/installation/
3. Set up the Vitastor backend
- Create a separate pool for S3 data in Vitastor.
- Create an image in another pool for storing metadata (“s3-volume-meta” to match the value in locationConfig.json):
vitastor-cli create -s 10G s3-volume-meta
- Copy config.json.vitastor to config.json, edit the public address as needed.
- Copy authdata.json.example to authdata.json - it’s where you specify access & secret keys - edit them if necessary. Scality uses a separate closed-source authentication service, so we can only use a file-based stub for now.
- Copy locationConfig.json.vitastor to locationConfig.json - it’s where you specify Vitastor connection data. Put the correct pool_id and metadata_image there.
4. Run it
Use the command node index.js
.
And that’s basically it. You get a working S3 server on port 8000.
Bonus - pounding node_modules
In node.js applications, a typical situation is “girls were swimming in the lake, node_modules found” ©.
When installing Zenko CloudServer, for example, you get 23885 files in node_modules with a total size of 475 MB. Your sense of beauty suffers - what should you do?
The answer to this question is webpack. Yes, it can be used not only for client-side browser code, but also on the server!
How does it work? You need webpack.config.js
- it’s already
in the zenko-vitastor repository. So go to your directory with zenko cloudserver installed and run:
npm exec webpack --mode=development
It will produce a single 25 MB dist/zenko-vitastor.js
file (or even 15MB if you run it with --mode=production
),
without any dependencies except native modules. You can find these modules using find node_modules -name '*.node'
- node.js binary modules have the extension
.node
.
In case of zenko there are 7 such modules - they have to be copied from the original node_modules:
- vitastor itself
- bufferutil
- diskusage
- fcntl
- ioctl
- leveldown
- utf-8-validate
Bonus number two - Zenko dependencies
In addition to “normal” npm dependencies, Zenko has some “private” ones.
The first portion of them were actually optional so I conditionally disabled them in code and removed them from package.json:
- vaultclient - a client for a closed source password/key storage service “Scality Vault” (should not be confused with Hashicorp Vault). Note: it also has another library tied to it - utapi, so it also has to be untied from it.
- bucketclient - a client for a closed-source service for storing S3 object metadata “Scality bucketd”.
- hdclient - client for a closed-source S3 object data storage service “Scality Hyperdrive”.
- sproxydclient - a client for what is apparently another closed-source S3 object data storage service, “Scality sproxyd”.
The second portion were isolated repositories which were just omitted:
- backbeat and breakbeat - Lifecycle services, opensource, but at first glance the implementation is a bit strange (relies on reading mongo oplog through kafka connect) so it seems we don’t want it.
- s3utils - mostly consists of things related to replication and closed-source Scality services (the same sproxyd, etc.).
The third portion contains necessary or mostly necessary libraries:
- arsenal - really it’s a part of zenko cloudserver, just extracted into a separate library.
- eslint-config-scality - just a eslint config.
- node-fcntl - this is not actually fcntl at all, it’s posix_fadvise for node.js.
- httpagent is Scality’s wrapper over the standard HTTP client.
- scubaclient - a client for the bucket quota/counter service SCUBA (AQUALUNG?.. Scality Consumption Utilization and Billing API). Its stub is used to run tests so it can be left in place. Also who knows, maybe we’d also want a separate bucket counter service in the future…
- utapi - utilization API, a library for collecting metrics - maybe it’s slightly overcomplicated and maybe a simple prometheus exporter could suffice, but Scality decided to do it this way.
- werelogs - logging library/wrapper.
All of these libraries could be cut out too, and arsenal could be merged into the main cloudserver repository. But it’s left in place more or less as is for the ease of subsequent merges with upstream versions, which, apparently, are coded by authors in real time.
Also there’s Orbit, a web interface for Zenko, but it goes beyond the scope of this article.