🖥️

B0-Production-Infrastructure

B0 — Production Infrastructure

Status. Locked. This page is the single source of truth for the production stack and supersedes any conflicting statement on earlier pages. Where another page disagrees, B0 wins.

Posture. Web-first, mobile-supported. The Laravel Control Center is the primary operator surface. The iPhone app is a fully-featured remote that uses the same API. No business logic lives outside Laravel.


1. Locked production stack

🧭

Canonical diagrams (Option 1). For system-level diagrams (architecture, run lifecycle, Git/snapshots, RAG flows, security boundaries, export flows), link to 🧭ARCHITECTURE_DIAGRAMS (Canonical) instead of duplicating Mermaid here.

LayerChoiceNotes
Operating systemCentOS-compatible Linux (AlmaLinux 9 / Rocky Linux 9 recommended)Long support window, RPM ecosystem, SELinux available.
Web serverLiteSpeed Web Server (LSWS) 6.x (Enterprise or OpenLiteSpeed)Event-driven, HTTP/2 + HTTP/3, native LSAPI.
PHP runtimelsphp 8.3 via LSAPIFaster than PHP-FPM under LSWS; no separate process manager.
BackendLaravel 11See B1 for app skeleton.
DatabaseMySQL 8.0 or MariaDB 11.4+Default. PostgreSQL+pgvector is an opt-in upgrade path, not the default.
Cache / queue / locks / pubsubRedis 7Sessions, cache, queues, workspace locks, broadcast events.
Job runnerLaravel Horizon under Supervisor (or systemd unit)Supervisors: agents-default, agents-long, rag-index, docs.
RealtimeLaravel Reverb for web WS + SSE for iPhoneBoth fanned out from one ConsoleEventService::publish().
SSL / TLSLet's Encrypt (acme.sh or certbot) or server-provided certHTTP/3 enabled, HSTS preload.
StorageLocal disk for workspaces + snapshots; S3-compatible for long-term snapshot archives (optional)S3 path keeps prod nodes stateless beyond active workspaces.
DeploymentGit-based with atomic release directoriesSee §8.
Process supervisionSupervisor preferred; systemd units acceptableReverb + Horizon + workspace janitor.
Backupmysqldump to S3 daily + wal-style binlog snapshotRPO ≤ 24 h, RTO ≤ 1 h.
MonitoringLSWS access log + Laravel Telescope (non-prod) + Horizon dashboard + Prometheus node exporterOptional Sentry for app exceptions.
Timechrony synced to NTPRequired for SSE/token expiry correctness.

Explicitly not in the production stack (rejected for v1): Nginx, Apache HTTP Server, PHP-FPM, PostgreSQL-as-default, Docker as the production runtime (Docker is dev-only). Any earlier spec page that implies otherwise defers to this one.


2. Server provisioning (CentOS / AlmaLinux / Rocky)

Run as root or with sudo. Steps assume a fresh AlmaLinux 9 box.

# 2.1 base hardening
dnf -y update
dnf -y install epel-release dnf-plugins-core firewalld policycoreutils-python-utils chrony git curl tar unzip jq
systemctl enable --now chronyd firewalld
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --permanent --add-port=443/udp        # HTTP/3 (QUIC)
firewall-cmd --reload

# 2.2 user accounts
adduser deploy
usermod -aG wheel deploy
mkdir -p /home/deploy/.ssh && chmod 700 /home/deploy/.ssh
# add deploy key to /home/deploy/.ssh/authorized_keys
chown -R deploy:deploy /home/deploy/.ssh

# 2.3 SELinux: keep enforcing; we'll grant the contexts we need below.
sestatus

# 2.4 LiteSpeed repo + LSWS Enterprise (or OpenLiteSpeed)
# OpenLiteSpeed path:
rpm -Uvh https://rpms.litespeedtech.com/centos/litespeed-repo-1.2-1.el8.noarch.rpm
dnf -y install openlitespeed
# OR LSWS Enterprise (license required):
# follow https://docs.litespeedtech.com/ then `dnf install lsws`

# 2.5 lsphp 8.3 + extensions
dnf -y install lsphp83 lsphp83-common lsphp83-mysqlnd lsphp83-pdo lsphp83-mbstring \
               lsphp83-opcache lsphp83-redis lsphp83-bcmath lsphp83-gd lsphp83-intl \
               lsphp83-soap lsphp83-xml lsphp83-zip lsphp83-process lsphp83-pecl-imagick \
               lsphp83-pecl-zip

# 2.6 database
dnf -y module reset mysql && dnf -y module enable mysql:8.0 && dnf -y install mysql-server
# or MariaDB:
# dnf -y install mariadb-server
systemctl enable --now mysqld
mysql_secure_installation

# 2.7 redis
dnf -y install redis
systemctl enable --now redis

# 2.8 supervisor
dnf -y install supervisor
systemctl enable --now supervisord

# 2.9 git + node (Node only for Vite asset build in CI; not required at runtime)
dnf -y install git
curl -fsSL https://rpm.nodesource.com/setup_20.x | bash -
dnf -y install nodejs

# 2.10 composer
curl -sS https://getcomposer.org/installer | /usr/local/lsws/lsphp83/bin/php -- --install-dir=/usr/local/bin --filename=composer

3. LiteSpeed virtual host configuration

Layout on disk:

/var/www/agent-workspace/
  current/          → symlink to releases/<timestamp>
  releases/<timestamp>/
  shared/
    .env
    storage/        → mounted into each release
    snapshots/
    workspaces/     → actual project clones live here (one dir per project)
  logs/

3.1 OpenLiteSpeed vhost (web admin or /usr/local/lsws/conf/vhosts/agent-workspace/vhconf.conf)

docRoot                   /var/www/agent-workspace/current/public
enableGzip                1
enableBr                  1
adminEmails               ops@example.com

index {
  useServer               0
  indexFiles              index.php
}

errorlog $SERVER_ROOT/logs/agent-workspace.error.log {
  useServer               0
  logLevel                NOTICE
  rollingSize             50M
  keepDays                14
}

accesslog $SERVER_ROOT/logs/agent-workspace.access.log {
  useServer               0
  logFormat               "%h %l %u %t \"%r\" %>s %b %D"
  rollingSize             50M
  keepDays                14
}

scripthandler {
  add                     lsapi:lsphp83 php
}

extprocessor lsphp83 {
  type                    lsapi
  address                 uds://tmp/lshttpd/lsphp83.sock
  maxConns                35
  env                     PHP_LSAPI_CHILDREN=35
  env                     PHP_LSAPI_MAX_REQUESTS=10000
  initTimeout             60
  retryTimeout            0
  pcKeepAliveTimeout      60
  respBuffer              0
  autoStart               2
  path                    /usr/local/lsws/lsphp83/bin/lsphp
  backlog                 100
  instances               1
  priority                0
  memSoftLimit            2048M
  memHardLimit            2560M
  procSoftLimit           400
  procHardLimit           500
}

rewrite {
  enable                  1
  autoLoadHtaccess        1
  rules                   <<<END
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]
END
}

3.2 Listeners

  • 80 (HTTP) — redirects to 443.
  • 443 (HTTPS / HTTP/2) — binds the vhost above.
  • 443/UDP (HTTP/3 QUIC) — enable in LSWS listener settings.
  • Reverb upstream: LSWS reverse-proxies /app/{appId} and /apps/{appId}/events to 127.0.0.1:8080 (Reverb).

3.3 Reverb upstream context

context /app/ {
  type                    proxy
  handler                 reverbBackend
  websocket               1
  addDefaultCharset       off
}
context /apps/ {
  type                    proxy
  handler                 reverbBackend
  websocket               1
  addDefaultCharset       off
}
extprocessor reverbBackend {
  type                    proxy
  address                 127.0.0.1:8080
  maxConns                100
  initTimeout             30
  retryTimeout            0
  respBuffer              0
}

3.4 SELinux contexts

semanage fcontext -a -t httpd_sys_rw_content_t "/var/www/agent-workspace(/.*)?"
semanage fcontext -a -t httpd_sys_rw_content_t "/var/www/agent-workspace/shared/workspaces(/.*)?"
restorecon -R /var/www/agent-workspace
setsebool -P httpd_can_network_connect 1     # outbound to OpenAI/Anthropic
setsebool -P httpd_execmem 0

If you exec git or other binaries from PHP, prefer dedicated user contexts (run agent commands as the deploy user via a setuid-free dispatcher) rather than relaxing SELinux.


4. PHP & Laravel runtime tuning

/usr/local/lsws/lsphp83/etc/php.ini overrides (drop in 99-agent.ini):

memory_limit = 512M
max_execution_time = 120          ; web requests; agent jobs run via queue, not web
upload_max_filesize = 64M
post_max_size = 64M
opcache.enable = 1
opcache.enable_cli = 1
opcache.memory_consumption = 256
opcache.interned_strings_buffer = 32
opcache.max_accelerated_files = 30000
opcache.validate_timestamps = 0   ; flip to 1 in staging
realpath_cache_size = 4096K
realpath_cache_ttl = 600
date.timezone = UTC
expose_php = Off

Laravel-side:

  • php artisan config:cache && php artisan route:cache && php artisan view:cache && php artisan event:cache after every deploy.
  • php artisan optimize:clear only when troubleshooting.

5. MySQL / MariaDB tuning

/etc/my.cnf.d/agent-workspace.cnf:

[mysqld]
bind-address               = 127.0.0.1
default-authentication-plugin = caching_sha2_password
character-set-server       = utf8mb4
collation-server           = utf8mb4_0900_ai_ci
max_connections            = 200
max_allowed_packet         = 64M
innodb_buffer_pool_size    = 2G        ; ~50-70% of RAM in prod
innodb_log_file_size       = 512M
innodb_flush_log_at_trx_commit = 1
innodb_flush_method        = O_DIRECT
slow_query_log             = 1
slow_query_log_file        = /var/log/mysql/slow.log
long_query_time            = 1.0
log_error_verbosity        = 2

RAG on MySQL. The default storage for rag_chunks.embedding is a JSON column (see B1 §4.7). Cosine similarity is computed in the app layer. This works to ~50k chunks per project. Past that, the operator either:

  1. enables MariaDB 11.7+ vector indexes (when GA in the host distro), or
  1. switches DB_CONNECTION to PostgreSQL with pgvector (the upgrade migration is already part of rag_chunks and is idempotent).

This is the only vector-store-related decision pending; the application code is driver-aware on day one.


6. Redis tuning

/etc/redis/redis.conf:

bind 127.0.0.1
protected-mode yes
maxmemory 1gb
maxmemory-policy allkeys-lru   ; cache + queue mixed; LRU is safe because queues are persisted via Horizon
appendonly yes
appendfsync everysec

Laravel uses three logical Redis prefixes via database/connections:

cache:   db 0
session: db 1
queue:   db 2     (Horizon)
locks:   db 3     (workspace:lock:{project_id})

7. Process supervision (Supervisor)

/etc/supervisord.d/agent-workspace.ini:

[program:agent-horizon]
process_name=%(program_name)s
command=/usr/local/lsws/lsphp83/bin/php /var/www/agent-workspace/current/artisan horizon
autostart=true
autorestart=true
user=deploy
redirect_stderr=true
stdout_logfile=/var/www/agent-workspace/logs/horizon.log
stopwaitsecs=3600

[program:agent-reverb]
process_name=%(program_name)s
command=/usr/local/lsws/lsphp83/bin/php /var/www/agent-workspace/current/artisan reverb:start --host=127.0.0.1 --port=8080
autostart=true
autorestart=true
user=deploy
redirect_stderr=true
stdout_logfile=/var/www/agent-workspace/logs/reverb.log

[program:agent-schedule]
process_name=%(program_name)s
command=/usr/local/lsws/lsphp83/bin/php /var/www/agent-workspace/current/artisan schedule:work
autostart=true
autorestart=true
user=deploy
redirect_stderr=true
stdout_logfile=/var/www/agent-workspace/logs/schedule.log

systemctl enable --now supervisord and supervisorctl reread && supervisorctl update.


8. Git-based deployment (zero-downtime, atomic)

Deploy as the deploy user. Strategy: clone into a timestamped release dir, run the build, then atomically swap the current symlink.

#!/usr/bin/env bash
set -euo pipefail
APP=/var/www/agent-workspace
REL="$APP/releases/$(date +%Y%m%d%H%M%S)"

git clone --depth 1 --branch "${1:-main}" git@github.com:org/agent-workspace.git "$REL"
cd "$REL"

# wire shared paths
ln -sfn "$APP/shared/.env"        "$REL/.env"
rm -rf "$REL/storage"
ln -sfn "$APP/shared/storage"     "$REL/storage"
ln -sfn "$APP/shared/workspaces"  "$REL/storage/app/workspaces"
ln -sfn "$APP/shared/snapshots"   "$REL/storage/app/snapshots"

# build
composer install --no-dev --prefer-dist --optimize-autoloader --no-interaction
npm ci && npm run build
php artisan migrate --force
php artisan storage:link || true
php artisan config:cache && php artisan route:cache && php artisan view:cache && php artisan event:cache
php artisan filament:optimize

# atomic swap
ln -sfn "$REL" "$APP/current"

# restart workers (graceful)
php "$REL/artisan" horizon:terminate
supervisorctl restart agent-reverb

# tell LSWS to pick up new opcache
touch "$REL/public/index.php"

# retention: keep last 5 releases
ls -1dt "$APP/releases"/* | tail -n +6 | xargs -r rm -rf

Rollback is ln -sfn $APP/releases/<previous> $APP/current && supervisorctl restart agent-reverb && php artisan horizon:terminate.


9. SSL / TLS

dnf -y install epel-release
curl https://get.acme.sh | sh -s email=ops@example.com
~/.acme.sh/acme.sh --issue -d agent.example.com -w /var/www/agent-workspace/current/public
~/.acme.sh/acme.sh --install-cert -d agent.example.com \
  --key-file       /usr/local/lsws/conf/cert/agent.key  \
  --fullchain-file /usr/local/lsws/conf/cert/agent.crt  \
  --reloadcmd      "systemctl reload lsws"

In LSWS listener 443: point keyFile/certFile to the files above, enable HTTP/2 and HTTP/3 (QUIC), enable HSTS with max-age=31536000; includeSubDomains; preload.


10. Backups & disaster recovery

  • Nightly: mysqldump --single-transaction --routines --triggers --events agent_workspace | gzip > $S3/db/$(date +%F).sql.gz.
  • Hourly: binlog increment to S3.
  • Snapshots: workspace_snapshots.archive_path either local + rsync to S3 nightly, or S3 directly when AGENT_SNAPSHOT_DRIVER=s3.
  • Restore drill: quarterly. Bring up a sibling node, restore the latest dump, replay binlogs, run php artisan agent:reindex --all, run the smoke suite.

11. Observability

  • LSWS access log in combined format → shipped to your log store.
  • Laravel logs via daily channel under storage/logs/.
  • Horizon dashboard at /horizon (admin-only).
  • Telescope allowed in staging only.
  • Reverb logs to logs/reverb.log; success metric: WS message lag p95 < 250 ms.
  • Node exporter + LSWS exporter for Prometheus (optional).

12. Security posture (infra-layer)

  • SELinux enforcing in prod.
  • firewalld allows only 80/443/443-udp inbound.
  • SSH on a non-default port with key-only auth; fail2ban enabled.
  • All app secrets live in server_params (B1 §4.14), not in environment variables, with the sole exception of bootstrap keys (APP_KEY, DB DSN, Redis URL).
  • Outbound to api.openai.com and api.anthropic.com only — enforce via firewalld ipset if compliance requires.
  • No third-party CDN dependencies in the operator UI — self-hosted Monaco, Tailwind, fonts.

13. iPhone client posture (infra view)

The iPhone app is treated as just another HTTPS client by this layer. It does not connect to the database, Redis, Git, OpenAI, Anthropic, or any internal service — only to LSWS on 443. It cannot read .env, server paths, or secrets. Pairing happens entirely through the Control Center (B1 §8 + spec page 18 §13).

Prohibitions enforced by the API surface:

  • No endpoint returns plaintext secrets (regex-scanned in CI).
  • No endpoint accepts a workspace path outside the project root.
  • No endpoint exposes a shell, Git binary, or filesystem write to arbitrary paths.
  • All mutating endpoints require a Sanctum token with the correct ability (see B1 §8.3).

14. Conflict resolution against earlier spec pages

Earlier statementStatusResolution
"PostgreSQL 16 is the primary DB" (page 01)SupersededMySQL 8 / MariaDB 11.4+ is default; PG+pgvector is opt-in.
"Backend-first, iPhone-first" (page 01 callout)RepolishedWeb-first, mobile-supported. Build order still backend → web Control Center → iPhone.
"Web console (later, optional)" (page 01 §1)SupersededWeb Control Center is part of v1; iPhone is alongside, not in front.
"Nginx / Apache" anywhereRemovedLSWS is the only supported web server in prod.
"pgvector required" (page 03)SoftenedDriver-aware migration ships both paths; MySQL JSON is default.
"Mobile-only architecture" or "iPhone-first" wordingRemovedParity rule from spec page 18 governs.

Future edits to pages 01, 03, and 13 must reference B0 in their headers.


15. Acceptance criteria for B0 (infra ready)

  1. curl -I https://agent.example.com returns HTTP/2 and HSTS header.
  1. curl --http3 -I https://agent.example.com succeeds.
  1. supervisorctl status shows agent-horizon, agent-reverb, agent-schedule all RUNNING.
  1. mysql -e "SELECT VERSION();" returns 8.0+ or MariaDB 11.4+.
  1. redis-cli ping returns PONG; redis-cli -n 2 keys 'queues:*' lists Horizon queues after a first php artisan horizon run.
  1. selinux is enforcing; LSWS request to /horizon succeeds without AVC denials in /var/log/audit/audit.log.
  1. The deploy script in §8 completes end-to-end on a clean release dir and the new release is live in <30 s.
  1. A QUIC-enabled iPhone reaches /api/me over HTTP/3 (verified via Charles or nscurl).
  1. Killing agent-reverb causes the Control Center to fall back to polling without losing console events; restarting it resumes WS push within 5 s.
  1. CI secret-leak scanner (B1 §13) reports 0 hits.

16. Operational runbook (one-pager)

  • Deploy: sudo -iu deploy /var/www/agent-workspace/shared/bin/deploy.sh main
  • Rollback: swap current symlink to the prior releases/..., restart agent-reverb, php artisan horizon:terminate.
  • Restart web: systemctl restart lsws
  • Restart workers: supervisorctl restart agent-horizon agent-reverb
  • Tail logs: tail -f /var/www/agent-workspace/logs/*.log /usr/local/lsws/logs/error.log
  • DB backup now: /var/www/agent-workspace/shared/bin/backup.sh
  • Rotate secrets: update via Filament → /admin/server-params, then php artisan config:clear.
  • Reindex one project: php artisan agent:reindex --project={id}

17. Out of scope for B0

  • Kubernetes / multi-node clustering. v1 is single-node; horizontal scaling is a B7+ topic.
  • Bring-your-own-LLM gateways (LiteLLM, vLLM). v1 uses OpenAI + Anthropic + Claude Code directly.
  • Per-tenant isolation. v1 is single-tenant; tenancy is schema-ready but not enabled.

From now on every other spec page that talks about web server, OS, PHP runtime, DB engine, or process supervision defers to this page. Future build prompts (B2 onward) extend B0 with workload-specific deltas only.