This is a framework under active development — not a committed build. The schema below represents the agreed-on long-term architecture. Specific hardware choices within each node are still being evaluated. Competing options are documented. The Raspberry Pi testbed runs in parallel and feeds real data into this design process.
3
Total racks
100
Gbps fabric
PB
Storage target
0
Cloud deps
∞
Local sovereign
01
Rack 1 — primary lab (core fabric)
The first rack is the primary compute and network fabric. All three racks share this network. 30–50% empty space reserved from day one — this is infrastructure, not a PC build.
| Slot | Role | Function | Options / notes |
|---|---|---|---|
| Node 0 SWITCH | 100 GbE fabric | All nodes communicate through this. No direct point-to-point hacks. Unifying nervous system of the entire lab. Leave open ports for future expansion. |
Option A MikroTik CRS354 (48× GbE + 2× 100G QSFP28) ~$700 Option B Arista 7060X / Juniper QFX series (enterprise, ~$3k+) DECIDE: managed vs unmanaged initially |
| Node 1 POWER | Surge · UPS · filter | Clean power for all nodes. UPS provides runtime during outages. Surge protection for all RF and SDR hardware — switching PSUs generate noise that contaminates SDR receive chains. |
APC Smart-UPS 1500VA RT or equivalent · clean sine wave output PDU with metered outlets per rack RF note: SDR node on isolated PDU branch if possible |
| Node 2 COMPUTE | Local compute · GPU/NPU farm | Heavy background processing: photogrammetry (ODM/COLMAP), Gaussian splatting (Nerfstudio/Postshot), LiDAR fusion (PDAL), digital twin rendering (UE5+Cesium). Kubernetes breakaway to Rack 2 for burst workloads. |
Option A — single workstation Threadripper 7960X + RTX 4090 + 256 GB RAM (~$8–12k) Option B — mini-rack fabric 3× Lenovo ThinkCentre Tiny (8–16c, 64 GB each) + separate GPU server 2U (~$5–8k total) NOT DECIDED See Compute Paths tab |
| Node 3 NAS | Petabyte storage | All persistent data: raw drone imagery, point clouds, sensor time-series, processed outputs, long-term biome logs. ZFS initially, Ceph as cluster grows. NVMe cache tier for active working sets. |
Option A TrueNAS Scale on mini-PC + 4–8 HDDs RAIDZ2 Option B Synology DS1823xs+ / QNAP TES-1885U (enterprise NAS) Breakaway to Rack 3 as storage scales. 10GbE NFS/iSCSI to all compute nodes. PLAN: Ceph migration path from ZFS |
| Node 4 SDR HEAD | Full-spectrum analysis · recording | Dedicated RF node physically isolated from switching PSUs. RTL-SDR (Phase 0), HackRF (Phase 1), LimeSDR Mini 2 / USRP (Phase 2+). Spectrum monitoring, NOAA/APRS/ADS-B receive, Meshtastic gateway, Reticulum rnode interface. |
Hardware: Raspberry Pi 5 or small x86 mini-PC (clean USB power) Separate thermal zone — RF hardware runs warm Isolation note: SDR performance degrades near switching PSUs. Physical separation matters. |
| Node 4-sub RF AUX | TX/RX · amp · atten · antenna | All external RF hardware cabled to SDR head. Transceivers, amplifiers, attenuators, bandpass filters, antenna splitters. Modular — each frequency band or function is a discrete unit that can be swapped independently. |
TX/RX: per active band (see Protocols page) LNA (low-noise amplifier) for receive sensitivity Bandpass filters to reject out-of-band interference SPEC: full RF chain per band on Protocols page |
| Node 5 WORKSTATION | Daily desktop · 6× 1440p · EUD sync | Primary operator interface. 6× 1440p displays across 6 virtual desktops. Synced with laptops and phone for mobile continuity. Drone GCS, live Grafana dashboards, QGIS, field planning, comms. |
High-core desktop (AMD Ryzen 9 or Intel Core Ultra 9) GPU with multi-display output (RTX 4080 can drive 4+ displays) Syncs with: Lenovo Yoga (canvas/field), ThinkPad (rugged field), S25 Ultra (phone) EUD = End User Device — phone, laptops are extensions of this desktop |
| Node 6+ SERVICES | Git · CI/CD · internal tools | Version control for configs, firmware, processing scripts. Self-hosted Gitea or GitLab. Internal documentation wiki. Monitoring stack (Prometheus/Grafana meta). Expandable — this slot covers future services as they are identified. |
Gitea (lightweight) or GitLab CE (full-featured) Prometheus + Alertmanager for infrastructure monitoring Containerised (Docker/Podman) on compute node or dedicated SPEC LATER — lower priority than data and RF nodes |
02
Rack 2 — compute breakaway (Kubernetes burst)
Rack 2 is a Kubernetes worker cluster that Rack 1's orchestration node can schedule workloads onto. It does not need to be populated at Phase 0 — it is reserved space. When multi-drone simultaneous ODM processing exceeds what Rack 1's compute can handle, Rack 2 workers absorb the overflow without any architectural change.
Rack 2 — role
| Purpose | Kubernetes worker nodes for burst compute |
| Trigger | Multi-drone simultaneous processing saturates Rack 1 |
| Nodes | 2–6 mini PCs or 1U servers depending on workload |
| Network | 100GbE uplink to Rack 1 switch |
| Storage | Reads/writes via Rack 1 NAS over NVMe-oF or NFS |
| GPU option | Add GPU server here to offload Nerfstudio / COLMAP at scale |
| Phase 0 status | Empty — reserve rack space only |
Rack 2 — scaling trigger events
| Trigger 1 | 3+ drones operating simultaneously → ODM queue backs up |
| Trigger 2 | Nerfstudio / Gaussian splat generation takes >6 hrs on Rack 1 |
| Trigger 3 | Real-time NDVI classification required during flight (not post-processing) |
| Trigger 4 | LiDAR fusion + photogrammetry running simultaneously |
| Response | Add worker nodes to Rack 2. K8s schedules jobs automatically. No code changes. |
03
Rack 3 — storage breakaway (Ceph / distributed)
Rack 3 absorbs storage growth that exceeds Rack 1's NAS capacity. ZFS on Rack 1 migrates to Ceph distributed storage spanning Racks 1 and 3. This is the petabyte path. Not needed until Rack 1 NAS is >70% full.
Rack 3 — role
| Purpose | Distributed storage cluster (Ceph OSD nodes) |
| Trigger | Rack 1 NAS exceeds ~70% capacity |
| Technology | Ceph RADOS distributed object/block/file store |
| Drive density | High-density JBOD enclosures (24–60 drives per 4U) |
| Network | 100GbE uplink to Rack 1 switch (Ceph needs fat pipes) |
| Scale target | Petabyte-range raw capacity |
| Phase 0 status | Not deployed — reserve rack space |
Ceph vs ZFS — transition decision
| ZFS (Rack 1 NAS) | Simple · single node · fast setup · Phase 0–2 |
| Ceph (Rack 1+3) | Distributed · no SPOF · petabyte-capable · Phase 2+ |
| Migration path | ZFS data exported to Ceph OSD nodes via rsync/rclone migration. TrueNAS Scale supports Ceph natively (Bluefin+). |
| MinIO alternative | S3-compatible object store. Simpler than Ceph for unstructured data (imagery, point clouds). Less suited to block storage. |
| Decision point | Evaluate at Rack 1 NAS ~50% full. Don't over-engineer before data volume is real. |
04
Auxiliary systems
Laptop 1 — Lenovo Yoga (canvas / field)
| Form factor | 2-in-1 · touchscreen · digital canvas |
| Primary use | On-site drone control · field planning · UE5 editing · sketching / annotation |
| Sync | Continuous sync with Rack 1 workstation via LAN/VPN |
| RF role | Drone GCS (Dronelink / DJI Fly) · Meshtastic app · Grafana mobile dashboard |
| Note | Touchscreen canvas function for site annotation on drone orthomosaics |
Laptop 2 — Lenovo ThinkPad (rugged / portable)
| Form factor | Durable · keyboard-first · field utility |
| Primary use | Scripting · config · SSH to rack · light processing · field documentation |
| Sync | Synced with Rack 1 via LAN/VPN |
| RF role | Fallback GCS · terminal access to SDR node · Reticulum client |
Phone — Samsung S25 Ultra
| Primary use | Mobile EUD · drone GCS · Meshtastic / MeshCore app · Infiray thermal camera |
| MANET / SDR | GoTenna MANET or equivalent for mobile mesh node capability evaluate options |
| Digital sketchpad | S Pen annotation · site sketching · field photography |
| Sync | Continuous with Rack 1 workstation and laptops |
| Reticulum | Sideband app (Android) — full Reticulum client over BT rnode |
Mesh radio stack
| Meshtastic | Phase 0 edge nodes · SenseCAP P1-Pro · Heltec V3 drone relay active now |
| MeshCore | Regional backbone evaluation · 64-hop repeater model parallel experiment |
| Reticulum | Hub routing daemon (rnsd on SDR head) · long-term transport layer active now |
| Other | AREDN (licensed ham mesh · Phase 3) · LoRaWAN (sensor density option · Phase 2) evaluate later |
05
Pi testbed — parallel track note
The Raspberry Pi 4 testbed is not part of the mini-rack schema — it is a parallel validation track. Pi #1 (MQTT/InfluxDB/Grafana/Reticulum) and Pi #2 (ODM/Potree/API ingest) are running the Phase 0 weather node pipeline right now and generating real data. That data and the lessons from that pipeline design feed directly into the mini-rack schema. When the mini-rack Rack 1 is deployed, the Pi testbed does not get replaced — the SDR head node (Node 4) absorbs its radio functions, and the Pi #2 ODM pipeline migrates to Rack 1 compute (Node 2). The Pis themselves become edge-only — firmware flashing, field-site sensor nodes, remote relay nodes where mains power isn't available.
Pi vs x86 for specific tasks: Pis remain the correct hardware for battery/solar-powered outdoor edge nodes (SenseCAP P1-Pro uses a similar ARM SoC). For anything involving parallel processing, GPU compute, simultaneous multi-stream ingest, or petabyte storage, x86 with proper network fabric is the only viable path. These are complementary, not competing.
Framework, not commitment. All four AI sources consulted independently converged on the three-plane separation. What is not settled is the specific implementation within each plane. Competing options documented side by side without picking winners.
01
Three-plane architecture
PLANE 1
CONTROL
Nervous system — always on, low bandwidth, safety-critical
Drone swarm kinematics, heartbeat, GPS sync, collision avoidance, metadata sidecars, sensor telemetry, MAVLink commands. Does NOT carry payload data. Must function independently of all other planes.
LoRa 915 MHzMeshtastic → MeshCore → Reticulum
~1–21 kbpsSolar autonomous
PLANE 2
INGEST
Bloodstream — burst data transfer, proximity-dependent
Raw sensor payload: imagery, point clouds, multispectral, thermal. Two competing models — not yet resolved:
Option A Physical store-and-forward
| Method | Drone records to NVMe. Lands at dock. USB 3.2 / Thunderbolt transfer at 10–40 Gbps. |
| Bandwidth | 40–120 Gbps at dock |
| Latency | Data at hub only after landing |
| Infrastructure | Docking stations across site with power + NVMe |
| Status | Phase 0 equivalent — current approach |
Option B Aerial burst (60 GHz / FSO)
| Method | Drone hovers near relay tower. Dumps data via 60 GHz mmWave or FSO laser at 20–40 Gbps. |
| Bandwidth | 802.11ay: up to 100 Gbps theoretical |
| Latency | Near-real-time while drone still flying |
| Range limit | 60 GHz: ~50–200 m LOS only. Rain-sensitive. |
| Status | Phase 3 — requires relay tower infrastructure first |
Hybrid is the likely long-term answer: physical offload for bulk archival, mmWave/FSO for priority live streams. But Option B requires relay towers. Build Option A first and validate the data model before committing to tower infrastructure.
PLANE 3
REFINERY
Brain — processing, storage, output generation
Receives all data from Planes 1 and 2. Fuses sensor streams. Produces digital twins, NDVI maps, point clouds, climate profiles, carbon baselines, ecological models. Fully local — no cloud.
This is Rack 1 (Node 2 compute + Node 3 NAS) expanding into Racks 2 and 3 as scale demands. See Mini-rack schema tab for node detail.
02
Physical backbone — options
| Option | Bandwidth | Span | Phase | Notes |
|---|---|---|---|---|
| Cat6A / 10GbE copper | 10 Gbps | 100 m / run | Now | Simplest. Rack 1 internal fabric. Trenching needed for cross-site outdoor runs. |
| Single-mode fibre (SMF) | 100 Gbps+ | km-scale | Phase 2+ | Fibre ring connecting relay towers across site. Immune to EM. Gemini/Grok recommendation for field scale. |
| 60 GHz mmWave P2P | 1–10 Gbps | 200–500 m LOS | Phase 2 | Wireless fibre substitute for tower spans. Rain-sensitive. Evaluate before fibre is laid. |
| Wi-Fi HaLow 802.11ah | ~8 Mbps | 1+ km | Phase 1–2 | Sub-GHz Wi-Fi. Better range than standard Wi-Fi. Useful for sensor-dense field areas. Not for drone data ingest. |
| FSO / laser | 1–10 Gbps | 500 m–2 km | Phase 3 | Licence-free optical wireless. Wind sway and fog are real problems. Tower-to-tower where fibre is impractical. |
03
Edge triage — reduce before moving
Option A Full raw archival · process at hub
| Model | Record everything raw. Transfer to Rack 1/3. Process there. |
| Strength | No data loss · re-processable with future algorithms |
| Weakness | Petabyte storage required faster |
| Phase fit | Phase 0–2. Correct while algorithms are being validated. |
Option B Edge triage · Jetson Orin on drone
| Model | Onboard NVIDIA Jetson Orin performs first-pass classification. Discards sky, static areas, below-threshold data. Transfers only changed/relevant data. |
| Hardware | Jetson Orin NX · 100 TOPS · ~150–400 g · ~5–15 W |
| Strength | Orders of magnitude smaller transfer volume |
| Weakness | Irreversible data loss if triage model is wrong |
| Phase fit | Phase 3+. Do not discard raw data until algorithms are validated on archival data first. |
Metadata sidecars — bridge both options: Regardless of which path is chosen, the LoRa control plane should transmit a lightweight metadata sidecar per frame: timestamp, GPS, altitude, orientation quaternion, sensor mode, exposure. Small enough for LoRa. Enables georeferencing before bulk transfer. Design this into firmware from Phase 0.
01
Sensor payload — data volume reality check
LoRa is the wrong unit of analysis for data volume. LoRa handles the control plane. The data plane is several orders of magnitude larger and requires completely different infrastructure.
| Sensor | Raw rate (est.) | Per 30-min mission | Notes |
|---|---|---|---|
| 16K visible (200MP single sensor) | ~660 MB/frame | ~100–500 GB | Even 1 fps = ~5 Gbps raw. Compressed AV1/H.266 ~10–50:1. |
| Full-sphere visible array | ~2–5 Gbps | ~450 GB–1.1 TB | Multiple 16K tiles stitched. Sensor count TBD. exact array spec open |
| Dense LiDAR (128-line, 20 Hz) | ~1–10 Gbps raw | ~200 GB–2 TB | Draco compression ~5–10:1. Point density and scan rate variable. |
| Multispectral (5–10 bands) | ~500 Mbps–2 Gbps | ~100–450 GB | Reduced per-band resolution. NDVI derivable in post. |
| Hyperspectral (200+ bands) | ~2–8 Gbps | ~450 GB–1.8 TB | Phase 3+. Heavy classification compute required. |
| Thermal IR | ~100–500 Mbps | ~22–110 GB | Lower resolution than visible. Radiometric calibration required. |
| IMU / GPS / telemetry | <1 Mbps | <500 MB | Negligible. Carried on LoRa control plane. |
| Single drone — full payload | ~2.5–20 Gbps | ~1–5 TB raw | Compressed practical: ~100–500 GB per mission. |
| 12-drone swarm | ~30–240 Gbps | ~12–60 TB | Simultaneous ingest. Distributed storage required. |
| Annual (12 drones, daily ops) | — | ~4–22 PB / year | Petabyte accumulation timescale. Physical storage fabric, not radio. |
LoRa implication: LoRa SF7 (fastest) = ~21 kbps. A single compressed 16K frame at 50 MB would take ~5 hours to transmit over LoRa. LoRa carries metadata, telemetry, and control signals only. This is the correct separation of concerns — not a limitation to fix.
02
Bandwidth ladder — control vs data planes
LoRa SF12 (Phase 0)
~1 kbps
LoRa SF7 (fastest)
~21 kbps
Wi-Fi HaLow 802.11ah
~8 Mbps
SDR OFDM (LimeSDR)
~30 Mbps
AREDN 802.11ac
~150 Mbps
Wi-Fi 6E (drone link)
~3 Gbps
60 GHz mmWave (802.11ay)
~40 Gbps
USB 3.2 / Thunderbolt (dock)
~20–40 Gbps
100 GbE fibre (Rack fabric)
100 Gbps
NVMe-oF / InfiniBand
200+ Gbps
The control plane (LoRa) and data plane (everything from Wi-Fi 6E up) are not on the same scale. The bandwidth ladder above is logarithmic in practice — there are nine orders of magnitude between LoRa and NVMe-oF.
The compute architecture is not decided. Two structurally different paths are documented below. The critical decision is the next hardware purchase — a single powerful machine commits to Path A; a mini PC as the first fabric node opens Path B. Both are valid. Neither forecloses the other permanently.
01
Path A — single powerful workstation
Workstation spec (representative)
| CPU | AMD Threadripper 7960X (24c) or Intel Core Ultra 9 285K |
| RAM | 128–256 GB DDR5 ECC |
| GPU | NVIDIA RTX 4090 · 24 GB VRAM · ODM + Nerfstudio + UE5 + COLMAP |
| NVMe | 2–4 TB (active working set) |
| HDD array | 48–200 TB ZFS (long-term archival) |
| Network | 10GbE NIC → managed switch |
| Est. cost | ~$5,000–$12,000 depending on storage |
Path A — assessment
| Strengths | Simple · low admin overhead · single maintenance point · works today |
| Ceiling | Hits wall at multi-drone simultaneous ingest + real-time processing |
| Breaks at | Swarm scale — 5+ drones simultaneous ODM saturates single CPU |
| Upgrade path | Second machine creates ad-hoc cluster — not designed, just bolted on |
| Pi integration | Pis demoted to edge-only. Workstation absorbs all Pi data roles. |
02
Path B — distributed mini-rack fabric
Fabric node roles (Rack 1)
| Node A — orchestration | Lenovo Tiny M90q · 16c · 64 GB · Proxmox / K3s · Mosquitto · InfluxDB |
| Node B — storage | Mini PC + SATA bays · TrueNAS · ZFS RAIDZ2 · NVMe cache |
| Node C — compute | Mini PC · 16c · 64 GB · ODM · COLMAP · PDAL · no GPU initially |
| Node D — RF/SDR | Pi 5 or mini PC · RTL-SDR · rtl_433 · Direwolf · Reticulum rnode |
| GPU server (add later) | 2U server or eGPU enclosure · RTX 4090 · triggers when Nerfstudio / UE5 needed |
Path B — assessment
| Strengths | Horizontal scaling · role isolation · no rebuild on upgrade · maps to HPC |
| Ceiling | Designed not to have one — add nodes to Rack 2 via Kubernetes |
| Admin overhead | Higher than Path A — Kubernetes, Ceph, and fabric require more ops knowledge |
| Initial cost | Higher upfront for equivalent single-node performance |
| Est. total cost | ~$5,500–$8,000 for full Rack 1 (Grok estimate) |
| Pi integration | Pis remain as edge nodes — they are Node D class devices in the fabric model |
GPT: "This is a micro data center, not a PC build." The fabric model is what allows the testbed to evolve into swarm-scale operations without architectural resets. Path A is faster to deploy. Path B is what you build if you know the destination.
03
GPU — when and what
| Trigger event | GPU requirement |
|---|---|
| UE5 + Cesium digital twin | RTX 4080/4090 · 16–24 GB VRAM · cannot run on CPU alone |
| Gaussian splatting (Nerfstudio / Postshot) | RTX 3090 minimum · 4090 recommended · VRAM-limited |
| COLMAP dense reconstruction | Any CUDA GPU accelerates significantly · 8+ GB VRAM |
| Real-time NDVI / hyperspectral classification | Jetson Orin (edge drone) or A100/H100 (hub) for inference |
| Multi-drone simultaneous ODM | Multi-GPU or cluster — single RTX 4090 saturates at ~5 simultaneous missions |
| On-drone edge triage | NVIDIA Jetson Orin NX (100 TOPS) · separate from hub GPU |
The existing Pi 4s are not replaced — they are repurposed. Every new purchase should either be a direct upgrade to the data plane or the first node of the distributed fabric.
0
Current — two Pi 4s, home network, Phase 0 testbed
Pi #1: MQTT + InfluxDB + Grafana + Reticulum + RTL-SDR. Pi #2: ODM + Potree + API ingest. Home Wi-Fi. microSD storage. One drone. Data ceiling ~200 GB. This is the validation track — generating real data that informs every subsequent hardware decision.
1
First upgrade — managed 10GbE switch (both paths require this)
MikroTik CRS312 or equivalent managed 10GbE switch (~$300–$500). This is the single purchase that benefits both Path A and Path B equally. It separates drone burst transfer traffic from sensor MQTT traffic with QoS policies. All future nodes connect here. Upgrade the switch before buying any compute.
2
First storage node — NAS (both paths require this)
TrueNAS Scale on a mini PC with 4–8 HDDs in ZFS RAIDZ2, or a dedicated NAS appliance. Drone imagery moves here from microSD. Pi #2's ODM scratch space moves here. This prevents "mission lost when card fills" incidents and creates the shared storage that all future compute nodes will read from.
3A
Path A fork — single workstation purchase
Buy Threadripper or Core Ultra 9 workstation with RTX 4090. ODM, Nerfstudio, UE5, COLMAP all move here. Pi #2 becomes API ingest + Potree serving only. This is fast, powerful, and adequate for single-drone operations and small swarms.
3B
Path B fork — first fabric compute node
Buy first Lenovo Tiny or equivalent mini PC as the orchestration + processing node. Install Proxmox or K3s. ODM moves here from Pi #2. Leave GPU slot open — add eGPU or GPU server as a separate fabric node when Nerfstudio is needed. This is slower per-task than Path A but every subsequent purchase adds capacity rather than complexity.
4
GPU node — triggered by Gaussian splats or UE5
When Nerfstudio or UE5+Cesium enters the pipeline: Path A already has it in the workstation. Path B adds a dedicated GPU server (2U or eGPU enclosure) as a new fabric node. RTX 4090 at current prices handles both Nerfstudio and UE5 adequately for single-site operations.
5
Multi-drone scale — Rack 2 activation
When 3+ drones operate simultaneously and Rack 1 compute saturates: Path B adds worker nodes to Rack 2, Kubernetes schedules across them automatically, Ceph replaces ZFS for distributed storage. Path A at this point requires a second machine and an ad-hoc cluster arrangement — this is where Path A's ceiling becomes concrete and Path B's design advantage becomes tangible.