ROMbsr Architecture Deep Dive

Understanding the security-focused design principles and technical implementation of ROMbsr's zero-trust Android ROM orchestration system

Core Design Principles

ROMbsr's architecture is built on five fundamental security principles that guide every design decision:

1. Complete Key Isolation

Signing keys NEVER exist outside the HSM. No intermediate storage, no temporary files, no memory copies. The build infrastructure mathematically cannot access signing keys.

2. Mutual Distrust

Build nodes don't trust sign nodes. Sign nodes don't trust build nodes. Every operation is cryptographically verified. Compromise of one component cannot compromise others.

3. Immutable Audit Trail

Git provides cryptographically-signed, tamper-evident logging. Every signing request and response is permanently recorded. Forensic analysis possible years later.

4. Fail-Safe Defaults

Request capture on build nodes prevents key exposure. Production signing requires explicit HSM configuration. No silent failures - every error is logged and alerted.

5. Operational Transparency

Every operation is observable through structured logs and metrics. No black boxes. Security through transparency, not obscurity.

6. Resumable by Design

State machine architecture ensures every operation is idempotent. Failures don't mean starting over. Resume exactly where you left off.

System Architecture Overview

┌──────────────────────────────────────────────────────────────────────────────┐
│                              BUILD ENVIRONMENT                               │
│                                                                              │
│  ┌────────────────┐      ┌─────────────────┐      ┌──────────────────┐       │
│  │                │      │                 │      │                  │       │
│  │  Source Sync   │─────▶│  Build Engine   │─────▶│  Request Capture │       │
│  │  (repo sync)   │      │  (AOSP build)   │      │  (libmock_pkcs11)│       │
│  │                │      │                 │      │                  │       │
│  └────────────────┘      └─────────────────┘      └────────┬─────────┘       │
│                                                            │                 │
│  ┌────────────────┐      ┌─────────────────┐      ┌────────▼─────────┐       │
│  │                │      │                 │      │                  │       │
│  │ State Manager  │◀────▶│ Orchestrator    │─────▶│ Request Generator│       │
│  │ (checkpoints)  │      │ (9-stage FSM)   │      │ (JSON + crypto)  │       │
│  │                │      │                 │      │                  │       │
│  └────────────────┘      └─────────────────┘      └────────┬─────────┘       │
│                                                            │                 │
└────────────────────────────────────────────────────────────┼─────────────────┘
                                                             │
                         GitOps Queue (Git Repository)       │
                    ┌────────────────────────────────────────▼─────────────────┐
                    │  requests/                                               │
                    │    └── <batch-id>/                                 │
                    │         ├── metadata.json      (build info)              │
                    │         ├── stage1-apk.json    (APK signing requests)    │
                    │         ├── stage2-avb.json    (AVB signing requests)    │
                    │         └── stage3-ota.json    (OTA signing requests)    │
                    └──────────────────────────────────────────────────────────┘
                                                             │
┌────────────────────────────────────────────────────────────┼─────────────────┐
│                             SIGNING ENVIRONMENT            │                 │
│                                                            ▼                 │
│  ┌────────────────┐      ┌──────────────────┐     ┌──────────────────┐       │
│  │                │      │                  │     │                  │       │
│  │ Queue Monitor  │─────▶│ Request Validator│────▶│  HSM Interface   │       │
│  │ (Git polling)  │      │ (crypto verify)  │     │  (PKCS#11 real)  │       │
│  │                │      │                  │     │                  │       │
│  └────────────────┘      └──────────────────┘     └────────┬─────────┘       │
│                                                            │                 │
│  ┌────────────────┐      ┌─────────────────┐      ┌────────▼─────────┐       │
│  │                │      │                 │◀─────│                  │       │
│  │Response Writer │◀─────│ Sign Processor  │      │   YubiHSM2       │       │
│  │ (Git commit)   │      │ (Python + HSM)  │      │   Hardware       │       │
│  │                │      │                 │      │                  │       │
│  └────────┬───────┘      └─────────────────┘      └──────────────────┘       │
│           │                                                                  │
└───────────┼──────────────────────────────────────────────────────────────────┘
            │
            ▼
    responses/<batch-id>/
      ├── stage1-apk-signed.json
      ├── stage2-avb-signed.json
      └── stage3-ota-signed.json
      
Security Note: The GitOps queue acts as a security boundary. Build nodes can only write to requests/, sign nodes can only write to responses/. Neither can modify the other's data, preventing privilege escalation.

Build Orchestrator: 9-Stage State Machine

The build orchestrator implements a finite state machine (FSM) with atomic, resumable stages:

Stage 1: ENVIRONMENT_SETUP
  → Validate system requirements
  → Create directory structure
  → Initialize state file
  → Checkpoint: environment_ready

Stage 2: SOURCE_SYNC
  → Configure repo tool
  → Sync AOSP sources (~100GB)
  → Apply ROM-specific patches
  → Checkpoint: source_synced

Stage 3: VENDOR_EXTRACTION
  → Download factory images
  → Extract proprietary blobs
  → Verify blob checksums
  → Checkpoint: vendor_ready

Stage 4: BUILD_CONFIG
  → Set environment variables
  → Configure build targets
  → Set up ccache
  → Checkpoint: build_configured

Stage 5: COMPILATION
  → Execute AOSP build with request capture
  → libmock_pkcs11.so intercepts signing calls
  → Saves requests, returns temp signatures
  → Checkpoint: build_complete

Stage 6: REQUEST_CAPTURE
  → Aggregate captured signing requests
  → Create request metadata
  → Calculate artifact hashes
  → Checkpoint: requests_generated

Stage 7: QUEUE_SUBMISSION
  → Commit to GitOps queue
  → Sign commit with GPG
  → Push to remote
  → Checkpoint: requests_submitted

Stage 8: WAIT_FOR_SIGNING
  → Poll for responses
  → Verify signatures
  → Download signed artifacts
  → Checkpoint: signing_complete

Stage 9: FINALIZATION
  → Assemble final ROM
  → Generate checksums
  → Create release metadata
  → Checkpoint: build_finalized
Resumability Example: If compilation fails after 4 hours at 95% complete, running bin/rombsr resume <batch-id> will skip stages 1-4 and retry compilation from the exact failure point.

Sign Orchestrator: Secure Processing Pipeline

Request Validation Pipeline

1. Git Pull & Verification
   - Pull new requests from queue
   - Verify GPG signatures
   - Check commit integrity

2. Request Structure Validation
   - Parse JSON requests
   - Validate schema compliance
   - Check required fields

3. Security Policy Enforcement
   - Verify requesting node authorization
   - Check signing quotas/limits
   - Validate artifact hashes

4. HSM Authentication
   - Load authentication key
   - Establish HSM session
   - Verify HSM health

5. Cryptographic Operations
   - Sign each artifact
   - Generate signature metadata
   - Create audit records

6. Response Generation
   - Package signatures
   - Create response JSON
   - Calculate response hashes

7. Queue Commit
   - Write to responses/
   - Sign with GPG
   - Push to remote

Multi-Stage Signing Process

Stage 1: APK Signing

Signs all system and vendor APKs using platform, media, shared, and release keys. Each APK's certificate chain is verified before and after signing.

Stage 2: AVB Signing

Signs boot images and generates vbmeta structures for Android Verified Boot. Creates chain of trust from bootloader to system.

Stage 3: OTA Signing

Signs OTA metadata and payload for secure updates. Ensures only authorized updates can be installed on devices.

State Management: Transactional Integrity

State File Structure

{
  "batch_id": "grapheneos_shiba_20250130_141523",
  "version": "1.0",
  "rom": "grapheneos",
  "device": "shiba",
  "status": "in_progress",
  "current_stage": "compilation",
  "started_at": "2025-01-30T14:15:23Z",
  "updated_at": "2025-01-30T18:45:12Z",
  "checkpoints": {
    "environment_ready": {
      "completed": true,
      "timestamp": "2025-01-30T14:15:30Z",
      "duration_seconds": 7
    },
    "source_synced": {
      "completed": true,
      "timestamp": "2025-01-30T15:30:45Z",
      "duration_seconds": 4515,
      "metadata": {
        "repo_size_gb": 127,
        "manifest_revision": "android-14.0.0_r20"
      }
    },
    "compilation": {
      "completed": false,
      "attempts": 2,
      "last_error": "ninja: error: deps log too old",
      "retry_after": "2025-01-30T19:00:00Z"
    }
  },
  "artifacts": {
    "target_files": "out/target/product/shiba/obj/PACKAGING/target_files.zip",
    "ota_package": "out/target/product/shiba/shiba-ota.zip"
  }
}

Atomic State Transitions

State updates use a write-rename pattern to ensure atomicity:

# State update process
1. Read current state into memory
2. Update in-memory state
3. Write to temporary file (.state.json.tmp)
4. Sync to disk (fsync)
5. Atomic rename to final location
6. Remove temporary file on error

Request Capture: The Key Innovation

ROMbsr's request capture mechanism solves a fundamental problem: how to build Android ROMs without exposing signing keys to build infrastructure.

How Request Capture Works

1. AOSP build tools (apksigner, signapk) attempt to sign
   ↓
2. libmock_pkcs11.so intercepts PKCS#11 calls
   ↓
3. For each signing operation:
   - Captures the data to be signed
   - Records the key identifier needed
   - Saves request to batch file
   - Returns temporary signature
   ↓
4. Build continues without hanging
   ↓
5. All requests aggregated for HSM signing
Critical Insight: Traditional builds fail if they can't sign. ROMbsr's request capture returns temporary signatures that allow the build to complete, while capturing what needs to be signed later. This decouples building from signing entirely.

Request Capture vs Traditional Signing

Traditional AOSP Build

  • apksigner needs private key file
  • signapk reads keys from filesystem
  • Build fails without keys present
  • Keys exposed on build server

ROMbsr Request Capture

  • libmock_pkcs11.so intercepts calls
  • No private keys needed during build
  • Build completes with temp signatures
  • Real signing happens on secure node

PKCS#11 Abstraction Layer

ROMbsr's custom PKCS#11 implementation provides a unified interface for both development and production:

Application Layer (rombsr commands)
        │
        ▼
┌─────────────────────────────────────────┐
│      PKCS#11 Abstraction Layer          │
│                                         │
│  ┌─────────────────┐ ┌───────────────┐  │
│  │ Request Capture │ │  HSM Module   │  │
│  │  (build nodes)  │ │ (sign nodes)  │  │
│  └──────┬──────────┘ └───────┬───────┘  │
│         │                    │          │
│  ┌──────▼──────┐    ┌────────▼──────┐   │
│  │   Memory    │    │   YubiHSM2    │   │
│  │   Storage   │    │   Connector   │   │
│  └─────────────┘    └───────────────┘   │
└─────────────────────────────────────────┘
      
Build Safety: Request capture mode intercepts signing operations during build, saving requests while returning temporary signatures. This allows builds to complete without accessing production keys.

GitOps Queue: Immutable Audit Trail

Queue Structure

signing-queue/
├── .git/                    # Git repository
├── README.md                # Queue documentation
├── requests/
│   └── grapheneos_shiba_20250130_141523/
│       ├── metadata.json    # Build information
│       ├── stage1-apk-requests.json
│       ├── stage2-avb-requests.json
│       └── stage3-ota-requests.json
└── responses/
    └── grapheneos_shiba_20250130_141523/
        ├── metadata.json    # Signing information
        ├── stage1-apk-responses.json
        ├── stage2-avb-responses.json
        └── stage3-ota-responses.json

Cryptographic Verification

Every queue operation is cryptographically secured:

Performance Optimizations

Parallel Processing

Multi-threaded repo sync, parallel compilation with BUILD_JOBS=auto, concurrent signing request generation. Optimized for modern multi-core systems.

Intelligent Caching

ccache for compilation (50GB+ recommended), source code caching between builds, vendor blob caching with checksums. Reduces build time by 40-60%.

Network Optimization

Local mirror support for AOSP sources, shallow clones where possible, resumable downloads with progress tracking. Handles unreliable connections gracefully.

Monitoring & Observability

Prometheus Metrics

# Build metrics
rombsr_build_duration_seconds{rom="grapheneos",device="shiba",stage="compilation"}
rombsr_build_status{batch_id="...",status="success"}
rombsr_disk_usage_bytes{path="/var/lib/rombsr/builds"}

# Signing metrics
rombsr_signing_requests_total{status="pending"}
rombsr_signing_duration_seconds{operation="apk_sign"}
rombsr_hsm_operations_total{type="sign",result="success"}

# Queue metrics
rombsr_queue_depth{type="requests"}
rombsr_queue_age_seconds{batch_id="..."}

# System metrics
rombsr_state_corruption_total
rombsr_checkpoint_recovery_total

Structured Logging

All components use structured JSON logging for easy parsing and analysis:

{
  "timestamp": "2025-01-30T18:45:12.234Z",
  "level": "INFO",
  "component": "build-orchestrator",
  "batch_id": "grapheneos_shiba_20250130_141523",
  "stage": "compilation",
  "message": "Build completed successfully",
  "duration_seconds": 14567,
  "artifacts": ["system.img", "boot.img", "vendor.img"]
}