Vitastor 3.0.12 released

2026-05-17

Important fixes (except the new store)

  • Fixed a possible use-after-free in the OSD during error handling of initial commit/rollback of objects in EC pools.
  • Fixed a possible free of an invalid pointer in the OSD during read errors from snapshot/clone chains in EC pools.
  • Fixed possibly incorrect handling of commit/rollback operations in EC pools during pool PG count changes.
  • Fixed the inverted fsync enable parameter in the ublk driver (fsync was not enabled on pools without immediate_commit).
  • Added the raw-ls command for debugging purposes to find object versions in the cluster using listing operations.

New store fixes

  • Improved startup speed by using LSN-based sorting only for objects with a large number of intermediate versions.
  • Added skip_double_claim option as a temporary workaround to fix the rare OSD startup error with the “double claimed block” message, observed by several users. This option does not affect data integrity.
  • Fixed incorrect rechecking of small writes during startup, which in theory could lead to duplicate small write object entries on the OSD.
  • Fixed fsync operation for disks with a writeback cache (without capacitors):
    • Fixed incorrect semantics of consecutive fsyncs (next fsync was not blocked by the previous one).
    • Added fsync when copying small writes from the buffer to the data device (somehow forgotten during initial development).
    • Added fsync after the initial garbage collection during OSD startup.
    • Fixed incorrect cast of LSN from uint64 to uint32, breaking fsync when reaching LSN 2^32.
  • Added missing verification of the metadata header checksum during startup.
  • Fixed incorrect updating of object checksums in perfect_csum_update=true mode.
  • Fixed a possible OSD crash with “assertion failed” when processing a malformed EC STABILIZE operation.
  • Fixed the accounting of active compactor coroutines.
  • Removed broken and untested new->old store conversion support.

Minor issues fixed

  • Incorrect accounting of OSD local operation statistics in replicated pools.
  • Missing non-zero exitcodes on vitastor-disk resize command errors.
  • Missing reset of the list of inconsistent objects during PG restarts.
  • Theoretically possible hangs of various OSD operations when working with completely corrupted objects (without a single available copy), and possibly in some other very rare situations.
  • Incorrect fsyncs when deleting objects from pools without immediate_commit (on disks with a writeback cache), which previously could leave garbage when deleting misplaced objects.
  • Possible crash/memory corruption of the NFS server during a targeted attack on NFS-RDMA.
  • Possibly incorrect handling of ENOSPC/EIO write errors in replicated pools, leading to inability to retry the write later.
  • Possible crash instead of a clean error exit when starting an OSD with the old storage engine on a disk with corrupted journal data.
  • Shallow copying of PG configuration in the monitor, however, not related to actual bugs.
  • Incorrect checking of allocated blocks in the QEMU driver in an unused code branch (without the BDRV_WANT_ZERO flag).
  • Possible memory leak on read errors of corrupted objects.
  • Possible incorrect PG states when corrupted objects are detected.
  • Possible failure to mark all “bad” copies of an object during scrubs without checksums and with a large number of replicas (> 4).
  • Incorrect checksum calculation in the old storage engine when bitmap_granularity < 4096 (a practically unused configuration).
  • Theoretically possible OSD crash in rare cases during a scrub and simultaneous object recovery.
  • Theoretically possible OSD crash when handling PING operation errors.
  • Slightly suboptimal logic for reusing the RDMA send buffer.
  • Possible memory leak when canceling an already running scrub via no_scrub.
  • Possible memory corruption when a client (e.g., QEMU code) passes invalid buffers and the writeback cache is enabled.
  • Potentially incorrect search for corrupted parts of EC objects (inability to find a “good” combination) during a scrub with checksums disabled.
  • Possible additional memory usage on the OSD side when handling failed reads from snapshots (not a leak however - the memory was freed upon client disconnection).
  • Potential sudden write slowdown at certain pg epoch values due to incorrect epoch update logic in etcd.