Step-by-step instructions for deploying ROMbsr in enterprise environments with high availability, monitoring, and operational excellence
A typical production ROMbsr deployment consists of multiple specialized nodes working together:
# Create dedicated user
sudo useradd -r -s /bin/bash -m rombsr
sudo usermod -a -G docker rombsr # If using containers
# Create directory structure
sudo mkdir -p /opt/rombsr
sudo mkdir -p /etc/rombsr
sudo mkdir -p /var/lib/rombsr
sudo mkdir -p /var/log/rombsr
# Set permissions
sudo chown -R rombsr:rombsr /opt/rombsr
sudo chown -R rombsr:rombsr /etc/rombsr
sudo chown -R rombsr:rombsr /var/lib/rombsr
sudo chown -R rombsr:rombsr /var/log/rombsr
# Clone repository
cd /opt
sudo -u rombsr git clone https://github.com/your-org/rombsr.git
cd rombsr
# Build PKCS#11 libraries
sudo -u rombsr make all
# Install Python dependencies
sudo -u rombsr python3 -m venv /var/lib/rombsr/venv
sudo -u rombsr /var/lib/rombsr/venv/bin/pip install -r lib/python/requirements.txt
# Set production paths
export ROMBSR_ROOT=/opt/rombsr
export ROMBSR_CONFIG=/etc/rombsr
export ROMBSR_DATA=/var/lib/rombsr
# /etc/rombsr/common.conf
SIGNING_QUEUE_TYPE=git
SIGNING_GIT_URL=git@gitops.internal:android/signing-queue.git
RELEASE_BASE_URL=https://releases.example.com
LOG_LEVEL=INFO
LOG_FORMAT=json
# /etc/rombsr/build.conf
# Request capture mode - intercepts signing operations
SIGNING_MODE=mock
BUILD_JOBS=32
CCACHE_SIZE=100G
BUILD_TMPFS_SIZE=30G
REPO_SYNC_JOBS=8
GIT_USER_NAME="ROMbsr Build Bot"
GIT_USER_EMAIL="rombsr-build@example.com"
# /etc/rombsr/sign.conf
# Real HSM mode - processes captured requests
SIGNING_MODE=hsm
HSM_CONNECTOR_URL=http://localhost:12345
HSM_AUTH_KEY_ID=3
SIGNING_BATCH_SIZE=10
SIGNING_RATE_LIMIT=100
GIT_USER_NAME="ROMbsr Sign Bot"
GIT_USER_EMAIL="rombsr-sign@example.com"
# Install YubiHSM connector
wget https://developers.yubico.com/YubiHSM2/Releases/yubihsm-connector-latest.tar.gz
tar xzf yubihsm-connector-latest.tar.gz
cd yubihsm-connector
sudo make install
# Configure connector
cat > /etc/yubihsm-connector.yaml <
# Insert YubiHSM2 into USB port
# Start connector service
sudo systemctl enable --now yubihsm-connector
# Provision HSM (one-time)
cd /opt/rombsr
sudo -u rombsr bin/rombsr hsm-provision
# Import Android signing keys
sudo -u rombsr bin/rombsr hsm-import /secure/android-keys/
# Verify setup
sudo -u rombsr bin/rombsr hsm-status
sudo -u rombsr bin/rombsr hsm-test
# Copy service files
sudo cp /opt/rombsr/contrib/systemd/*.service /etc/systemd/system/
sudo cp /opt/rombsr/contrib/systemd/*.timer /etc/systemd/system/
# Reload systemd
sudo systemctl daemon-reload
# Enable services (sign node)
sudo systemctl enable --now rombsr-sign.service
sudo systemctl enable --now rombsr-monitor.service
# Enable timers (build node)
sudo systemctl enable --now rombsr-nightly.timer
sudo systemctl enable --now rombsr-cleanup.timer
# Check status
sudo systemctl status rombsr-*
Enables request capture via libmock_pkcs11.so. This intercepts signing operations during build, saves the requests, and returns temporary signatures so the build can complete. This is NOT a test mode - it's how production builds capture what needs to be signed.
Uses real HSM for processing captured signing requests. The sign orchestrator reads requests from the GitOps queue, applies real signatures using the YubiHSM2, and writes responses back. This completes the signing process started by the build nodes.
For production environments requiring high availability:
# Primary sign node
SIGN_NODE_ROLE=primary
SIGN_NODE_PRIORITY=100
# Secondary sign node
SIGN_NODE_ROLE=secondary
SIGN_NODE_PRIORITY=50
# Use distributed locking (Redis/etcd)
LOCK_BACKEND=redis
LOCK_REDIS_URL=redis://ha-redis:6379/0
LOCK_KEY_PREFIX=rombsr:sign:
# Failover configuration
FAILOVER_TIMEOUT=300
HEALTH_CHECK_INTERVAL=30
Use a simple scheduler to distribute builds across available nodes. Each node pulls from a central work queue.
Monitor CPU/RAM usage and route builds to least loaded node. Prometheus metrics enable smart scheduling.
Assign specific devices to specific build nodes to optimize cache usage and reduce redundant downloads.
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'rombsr-build'
static_configs:
- targets: ['build1:9090', 'build2:9090', 'build3:9090']
- job_name: 'rombsr-sign'
static_configs:
- targets: ['sign1:9090', 'sign2:9090']
# Key metrics to monitor
# - rombsr_build_duration_seconds
# - rombsr_signing_requests_pending
# - rombsr_hsm_errors_total
# - rombsr_disk_usage_percent
# alerting_rules.yml
groups:
- name: rombsr
rules:
- alert: BuildFailed
expr: increase(rombsr_build_failures_total[1h]) > 0
annotations:
summary: "Build failed for {{ $labels.rom }}/{{ $labels.device }}"
- alert: SigningQueueBacklog
expr: rombsr_signing_requests_pending > 50
for: 10m
annotations:
summary: "Signing queue backlog: {{ $value }} requests pending"
- alert: HSMError
expr: increase(rombsr_hsm_errors_total[5m]) > 0
annotations:
summary: "HSM errors detected"
severity: critical
- alert: DiskSpaceLow
expr: rombsr_disk_usage_percent > 85
annotations:
summary: "Low disk space on {{ $labels.instance }}"
Import the provided Grafana dashboard for comprehensive visibility:
Morning (9 AM):
□ Check overnight build status
$ rombsr status --since yesterday
□ Review any alerts from monitoring
$ journalctl -u rombsr-* --since "8 hours ago" | grep ERROR
□ Verify signing queue health
$ rombsr sign --status
□ Check disk space on all nodes
$ ansible all -m shell -a "df -h /var/lib/rombsr"
End of Day (5 PM):
□ Trigger nightly builds if not automated
$ rombsr build grapheneos shiba
$ rombsr build calyxos shiba
□ Clean old builds (keep last 7 days)
$ rombsr clean --builds --older-than 7
□ Verify HSM connectivity for overnight signing
$ rombsr hsm-status
Setting | Default | Optimized | Impact |
---|---|---|---|
BUILD_JOBS | auto (nproc) | nproc * 1.5 | +20% build speed |
CCACHE_SIZE | 50G | 100-200G | +40% cache hits |
USE_TMPFS | false | true (64GB+ RAM) | +30% I/O speed |
REPO_SYNC_JOBS | 4 | 16 | +60% sync speed |
# Fast NVMe for builds
/var/lib/rombsr/builds - NVMe SSD (1TB+)
/var/lib/rombsr/cache - NVMe SSD (200GB)
# Standard SSD for sources
/var/lib/rombsr/sources - SATA SSD (500GB)
# tmpfs for maximum speed (requires 64GB+ RAM)
mount -t tmpfs -o size=30G tmpfs /var/lib/rombsr/builds/out
# Optimize filesystem
tune2fs -o journal_data_writeback /dev/nvme0n1p1
tune2fs -O ^has_journal /dev/nvme0n1p1 # Disable journal (risky!)
# Check state file for errors
cat /var/lib/rombsr/state/<batch-id>.json | jq .
# Clear ccache if corruption suspected
ccache -C
# Increase memory limits
ulimit -v unlimited
# Check for disk space issues
df -h /var/lib/rombsr
# Check USB connection
lsusb | grep Yubico
# Restart connector
sudo systemctl restart yubihsm-connector
# Check connector logs
tail -f /var/log/yubihsm-connector.log
# Test with yubihsm-shell
yubihsm-shell -C localhost:12345
# Check git connectivity
cd /var/lib/rombsr/signing-queue
git pull
# Verify GPG keys
git log --show-signature -1
# Check sign orchestrator logs
journalctl -u rombsr-sign -f
# Manual queue processing
rombsr sign --once --verbose
export ROMBSR_LOG_LEVEL=DEBUG
#!/bin/bash
# Daily backup script
# Configuration backup
tar czf /backup/rombsr-config-$(date +%Y%m%d).tar.gz /etc/rombsr
# State backup
tar czf /backup/rombsr-state-$(date +%Y%m%d).tar.gz /var/lib/rombsr/state
# HSM backup (encrypted)
rombsr hsm-export /backup/hsm-$(date +%Y%m%d)
# Signing queue backup
cd /var/lib/rombsr/signing-queue
git bundle create /backup/queue-$(date +%Y%m%d).bundle --all
# Upload to offsite storage
aws s3 sync /backup/ s3://disaster-recovery/rombsr/