Proxmox VE ZFS Replication: Setup, Failover, and Monitoring
How to configure ZFS replication jobs in Proxmox VE for disaster recovery, including scheduling, pvesr commands, failover, and monitoring replication lag.
What Is ZFS Replication in Proxmox?
Proxmox VE includes built-in ZFS replication that copies VM and container data between cluster nodes using ZFS snapshots. After an initial full sync, only changed blocks are sent, making subsequent replications fast and bandwidth-efficient. This provides a warm standby of your VMs on another node — if the primary node fails, you can quickly start the VM on the replica node with minimal data loss. Replication is not the same as a backup; it is a real-time synchronization mechanism for disaster recovery.
Prerequisites
ZFS replication requires:
- A Proxmox VE cluster with at least two nodes
- ZFS storage on both source and target nodes (the pool names can differ)
- VMs or containers stored on ZFS storage
- Network connectivity between nodes (replication uses the cluster network by default)
Create a Replication Job
You can create replication jobs via the web UI or the command line. The command-line tool is pvesr:
# Create a replication job for VM 100 to node pve2, every 15 minutes:
pvesr create-local-job 100-0 pve2 --schedule "*/15"
# The job ID format is VMID-JOBNUMBER (e.g., 100-0 is the first job for VM 100)
# Schedule uses systemd calendar event syntax:
# "*/15" = every 15 minutes
# "*/2:00" = every 2 hours
# "*-*-* 02:00" = daily at 2 AM
From the web UI, select the VM, go to the Replication tab, and click Add. Choose the target node and schedule interval.
Manage Replication Jobs with pvesr
The pvesr command provides full control over replication jobs:
# List all replication jobs:
pvesr list
# Show detailed status of all jobs:
pvesr status
# View a specific job:
pvesr read 100-0
# Manually trigger a replication now:
pvesr run 100-0
# Update the schedule (every 5 minutes):
pvesr update 100-0 --schedule "*/5"
# Disable a job without deleting it:
pvesr update 100-0 --disable 1
# Re-enable:
pvesr update 100-0 --disable 0
# Delete a replication job:
pvesr delete 100-0
Failover with Replication
When a node fails, the replicated VM data already exists on the target node. If you are using Proxmox High Availability (HA), the HA manager can automatically migrate the VM to the replica node. Without HA, manual failover is required:
# If the source node is down, start the VM on the target node:
# The target node already has the replicated disks
qm start 100
# If the source node is still reachable but you want to migrate:
qm migrate 100 pve2 --online
# After failover, update or recreate replication in the reverse direction
# to keep the original node as the new replica target
The data loss window equals the replication interval. With a 15-minute schedule, you could lose up to 15 minutes of data. For critical workloads, reduce the interval (but watch the performance impact).
Limitations
Be aware of these replication limitations:
- ZFS only: Replication does not work with LVM, LVM-thin, directory, or Ceph storage — only ZFS.
- Cluster required: Both nodes must be in the same Proxmox cluster.
- Same VMID: The replicated VM keeps the same ID on the target node. You cannot replicate two VMs with the same ID to the same target.
- Bandwidth: Initial sync transfers the full dataset. For large VMs, this can take significant time and bandwidth. Schedule initial syncs during off-peak hours.
- No cross-storage: The target must also use ZFS. The pool name can differ, but the storage type must match.
Monitor Replication Lag
Monitoring replication lag is critical. If replication falls behind, your failover RPO increases:
# Check status and last sync time:
pvesr status
# Look at the "Last Sync" and "Next Sync" columns
# Check replication logs:
journalctl -u pvesr.service --since "1 hour ago"
# Check for failed replication jobs:
pvesr status | grep -i error
# Monitor ZFS send/receive progress during sync:
zfs list -t snapshot | grep __replicate
Set up alerts for replication failures. Proxmox logs replication events to the task log, which you can check from the web UI under Datacenter > Tasks.
Keeping tabs on replication status is especially important when you are away from your desk. ProxmoxR lets you check task logs and node status from your phone, so you can spot replication failures immediately without needing to be at your workstation.
Summary
ZFS replication in Proxmox VE is a straightforward disaster recovery tool that keeps synchronized copies of your VMs across cluster nodes. Configure replication jobs with pvesr, choose a schedule that balances RPO against performance, and monitor replication lag to ensure your standby copies stay current. Combined with Proxmox HA, replication provides automatic failover with minimal data loss for your critical workloads.
Take Proxmox management mobile
All the features discussed in this guide — accessible from your phone with ProxmoxR. Real-time monitoring, power control, firewall management, and more.