WAL Recovery
This page shows how to recover state using the Vix.cpp sync WAL (Write-Ahead Log).
Goal:
- Persist state transitions before any external effect
- Rebuild state deterministically after crash or restart
- Resume safely from an offset (checkpoint)
This guide is written for both:
- Beginners who want a copy-paste recovery flow
- Experts who want deterministic replay and offset discipline
What is WAL recovery
A WAL is an append-only log of state transitions. Recovery means:
- Open the WAL file
- Replay records in strict append order
- Rebuild in-memory state (Outbox, indexes, counters, etc.)
- Continue processing from a known offset
A key invariant:
- Every transition that matters is appended before side effects.
If the process crashes, you can replay the WAL and reconstruct the last correct state.
Headers
#include <vix/sync/wal/Wal.hpp>
#include <vix/sync/wal/WalRecord.hpp>Namespace:
vix::sync::walWAL record types
WAL records describe durable transitions. Typical record types:
- PutOperation: operation was created and persisted
- MarkDone: operation completed successfully
- MarkFailed: operation failed (retryable or permanent, depending on stored fields and policy)
A common recovery strategy:
- Start with empty state
- Apply records in order
- Last record for a given operation id wins
1) Minimal recovery example (rebuild a map of operations)
This is the simplest possible recovery: we rebuild a map keyed by operation id.
What you learn:
- How to replay from offset 0
- How to apply records deterministically
#include <cstdint>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vix/sync/wal/Wal.hpp>
#include <vix/sync/wal/WalRecord.hpp>
using vix::sync::wal::RecordType;
using vix::sync::wal::Wal;
using vix::sync::wal::WalRecord;
struct RecoveredOp
{
std::string id;
RecordType last_type{RecordType::PutOperation};
std::string last_error;
std::int64_t next_retry_at_ms{0};
};
int main()
{
std::unordered_map<std::string, RecoveredOp> ops;
Wal wal(Wal::Config{ "./.vix/wal.log", false });
wal.replay(0, [&](const WalRecord& rec)
{
auto& o = ops[rec.id];
o.id = rec.id;
o.last_type = rec.type;
if (rec.type == RecordType::MarkFailed)
{
o.last_error = rec.error;
o.next_retry_at_ms = rec.next_retry_at_ms;
}
else
{
o.last_error.clear();
o.next_retry_at_ms = 0;
}
});
std::cout << "Recovered ops: " << ops.size() << "\n";
return 0;
}When to use this pattern:
- When you need a minimal reconstruction step before loading higher-level components
- When you want to validate WAL correctness quickly
2) WAL offsets and checkpoints
WAL replay can be large. You should store a checkpoint offset.
Concept:
- last_applied_offset is the WAL byte offset you have fully processed
- On restart you replay from last_applied_offset, not from 0
Typical places to store the checkpoint:
- A small file: .vix/wal.offset
- A local database table (if you already use SQLite)
- A config store
Minimal checkpoint file example:
#include <cstdint>
#include <fstream>
#include <string>
static std::int64_t load_checkpoint(const std::string& path)
{
std::ifstream in(path);
std::int64_t off = 0;
if (in.good())
in >> off;
return off;
}
static void save_checkpoint(const std::string& path, std::int64_t off)
{
std::ofstream out(path, std::ios::trunc);
out << off;
}Important rule:
- Only save the checkpoint after you have fully applied all records up to that offset.
3) Practical recovery: rebuild Outbox from WAL
In many offline-first designs:
- The WAL is the source of truth
- The Outbox is reconstructed from WAL at startup
High-level approach:
- Replay WAL records
- For PutOperation: insert or update operation data
- For MarkDone: mark operation done
- For MarkFailed: mark operation failed and set next retry time
Pseudo-flow:
wal.replay(from, [&](const WalRecord& rec)
{
switch (rec.type)
{
case RecordType::PutOperation:
// decode payload into Operation, then store it
break;
case RecordType::MarkDone:
// mark operation done
break;
case RecordType::MarkFailed:
// mark failed, store error and next retry time
break;
}
});Note:
- WAL payload is typically a serialized Operation for PutOperation.
- During recovery, decode and rebuild the Outbox state you need.
4) How to test WAL recovery (beginner-friendly)
A) Create a test directory
mkdir -p .vix_testB) Run a small program that appends records
Write a tiny program that:
- Creates Wal at .vix_test/wal.log
- Appends a PutOperation
- Appends MarkFailed
- Exits
Then run the recovery program and verify that it reports 1 recovered op, last_type MarkFailed, and the error message.
C) Re-run recovery multiple times
Recovery must be deterministic. Running the recovery program twice should give the same result.
5) Crash simulation test (append then crash)
A classic test:
- Append PutOperation
- Start sending (simulate by printing)
- Crash before MarkDone
- Restart and recover
- You must see the operation as not done, and eligible for retry if policy says so
Minimal example:
// Process:
// - PutOperation appended
// - program exits before MarkDone
// On restart, recovery sees the operation as pending and can retry.This test proves:
- Local accepted work is not lost
- Recovery can safely resume
6) Common mistakes
Saving checkpoint too early
If you store last_applied_offset before applying the record, you can skip it on restart.
Rule:
- Apply record first, then persist checkpoint.
Non-deterministic application
Avoid reading current time during replay decisions. Replay should be pure. If you need time, store it in the record.
Mixing WAL and non-WAL writes
If you mutate external state without WAL append first, recovery becomes ambiguous.
7) Recommended structure in real systems
A robust boot sequence:
- Load checkpoint offset
- Replay WAL from checkpoint
- Rebuild Outbox and indexes
- Requeue inflight operations older than timeout
- Start SyncEngine or schedule ticks
That sequence aligns with offline-first invariants:
- durable local state
- deterministic recovery
- safe convergence after failures