This is a framework under active development — not a committed build. The schema below represents the agreed-on long-term architecture. Specific hardware choices within each node are still being evaluated. Competing options are documented. The Raspberry Pi testbed runs in parallel and feeds real data into this design process.
3
Total racks
100
Gbps fabric
PB
Storage target
0
Cloud deps
Local sovereign
01

Rack 1 — primary lab (core fabric)

The first rack is the primary compute and network fabric. All three racks share this network. 30–50% empty space reserved from day one — this is infrastructure, not a PC build.
SlotRoleFunctionOptions / notes
Node 0 SWITCH 100 GbE fabric All nodes communicate through this. No direct point-to-point hacks. Unifying nervous system of the entire lab. Leave open ports for future expansion. Option A MikroTik CRS354 (48× GbE + 2× 100G QSFP28) ~$700
Option B Arista 7060X / Juniper QFX series (enterprise, ~$3k+)
DECIDE: managed vs unmanaged initially
Node 1 POWER Surge · UPS · filter Clean power for all nodes. UPS provides runtime during outages. Surge protection for all RF and SDR hardware — switching PSUs generate noise that contaminates SDR receive chains. APC Smart-UPS 1500VA RT or equivalent · clean sine wave output
PDU with metered outlets per rack
RF note: SDR node on isolated PDU branch if possible
Node 2 COMPUTE Local compute · GPU/NPU farm Heavy background processing: photogrammetry (ODM/COLMAP), Gaussian splatting (Nerfstudio/Postshot), LiDAR fusion (PDAL), digital twin rendering (UE5+Cesium). Kubernetes breakaway to Rack 2 for burst workloads. Option A — single workstation Threadripper 7960X + RTX 4090 + 256 GB RAM (~$8–12k)
Option B — mini-rack fabric 3× Lenovo ThinkCentre Tiny (8–16c, 64 GB each) + separate GPU server 2U (~$5–8k total)
NOT DECIDED See Compute Paths tab
Node 3 NAS Petabyte storage All persistent data: raw drone imagery, point clouds, sensor time-series, processed outputs, long-term biome logs. ZFS initially, Ceph as cluster grows. NVMe cache tier for active working sets. Option A TrueNAS Scale on mini-PC + 4–8 HDDs RAIDZ2
Option B Synology DS1823xs+ / QNAP TES-1885U (enterprise NAS)
Breakaway to Rack 3 as storage scales. 10GbE NFS/iSCSI to all compute nodes.
PLAN: Ceph migration path from ZFS
Node 4 SDR HEAD Full-spectrum analysis · recording Dedicated RF node physically isolated from switching PSUs. RTL-SDR (Phase 0), HackRF (Phase 1), LimeSDR Mini 2 / USRP (Phase 2+). Spectrum monitoring, NOAA/APRS/ADS-B receive, Meshtastic gateway, Reticulum rnode interface. Hardware: Raspberry Pi 5 or small x86 mini-PC (clean USB power)
Separate thermal zone — RF hardware runs warm
Isolation note: SDR performance degrades near switching PSUs. Physical separation matters.
Node 4-sub RF AUX TX/RX · amp · atten · antenna All external RF hardware cabled to SDR head. Transceivers, amplifiers, attenuators, bandpass filters, antenna splitters. Modular — each frequency band or function is a discrete unit that can be swapped independently. TX/RX: per active band (see Protocols page)
LNA (low-noise amplifier) for receive sensitivity
Bandpass filters to reject out-of-band interference
SPEC: full RF chain per band on Protocols page
Node 5 WORKSTATION Daily desktop · 6× 1440p · EUD sync Primary operator interface. 6× 1440p displays across 6 virtual desktops. Synced with laptops and phone for mobile continuity. Drone GCS, live Grafana dashboards, QGIS, field planning, comms. High-core desktop (AMD Ryzen 9 or Intel Core Ultra 9)
GPU with multi-display output (RTX 4080 can drive 4+ displays)
Syncs with: Lenovo Yoga (canvas/field), ThinkPad (rugged field), S25 Ultra (phone)
EUD = End User Device — phone, laptops are extensions of this desktop
Node 6+ SERVICES Git · CI/CD · internal tools Version control for configs, firmware, processing scripts. Self-hosted Gitea or GitLab. Internal documentation wiki. Monitoring stack (Prometheus/Grafana meta). Expandable — this slot covers future services as they are identified. Gitea (lightweight) or GitLab CE (full-featured)
Prometheus + Alertmanager for infrastructure monitoring
Containerised (Docker/Podman) on compute node or dedicated
SPEC LATER — lower priority than data and RF nodes
02

Rack 2 — compute breakaway (Kubernetes burst)

Rack 2 is a Kubernetes worker cluster that Rack 1's orchestration node can schedule workloads onto. It does not need to be populated at Phase 0 — it is reserved space. When multi-drone simultaneous ODM processing exceeds what Rack 1's compute can handle, Rack 2 workers absorb the overflow without any architectural change.
Rack 2 — role
PurposeKubernetes worker nodes for burst compute
TriggerMulti-drone simultaneous processing saturates Rack 1
Nodes2–6 mini PCs or 1U servers depending on workload
Network100GbE uplink to Rack 1 switch
StorageReads/writes via Rack 1 NAS over NVMe-oF or NFS
GPU optionAdd GPU server here to offload Nerfstudio / COLMAP at scale
Phase 0 statusEmpty — reserve rack space only
Rack 2 — scaling trigger events
Trigger 13+ drones operating simultaneously → ODM queue backs up
Trigger 2Nerfstudio / Gaussian splat generation takes >6 hrs on Rack 1
Trigger 3Real-time NDVI classification required during flight (not post-processing)
Trigger 4LiDAR fusion + photogrammetry running simultaneously
ResponseAdd worker nodes to Rack 2. K8s schedules jobs automatically. No code changes.
03

Rack 3 — storage breakaway (Ceph / distributed)

Rack 3 absorbs storage growth that exceeds Rack 1's NAS capacity. ZFS on Rack 1 migrates to Ceph distributed storage spanning Racks 1 and 3. This is the petabyte path. Not needed until Rack 1 NAS is >70% full.
Rack 3 — role
PurposeDistributed storage cluster (Ceph OSD nodes)
TriggerRack 1 NAS exceeds ~70% capacity
TechnologyCeph RADOS distributed object/block/file store
Drive densityHigh-density JBOD enclosures (24–60 drives per 4U)
Network100GbE uplink to Rack 1 switch (Ceph needs fat pipes)
Scale targetPetabyte-range raw capacity
Phase 0 statusNot deployed — reserve rack space
Ceph vs ZFS — transition decision
ZFS (Rack 1 NAS)Simple · single node · fast setup · Phase 0–2
Ceph (Rack 1+3)Distributed · no SPOF · petabyte-capable · Phase 2+
Migration pathZFS data exported to Ceph OSD nodes via rsync/rclone migration. TrueNAS Scale supports Ceph natively (Bluefin+).
MinIO alternativeS3-compatible object store. Simpler than Ceph for unstructured data (imagery, point clouds). Less suited to block storage.
Decision pointEvaluate at Rack 1 NAS ~50% full. Don't over-engineer before data volume is real.
04

Auxiliary systems

Laptop 1 — Lenovo Yoga (canvas / field)
Form factor2-in-1 · touchscreen · digital canvas
Primary useOn-site drone control · field planning · UE5 editing · sketching / annotation
SyncContinuous sync with Rack 1 workstation via LAN/VPN
RF roleDrone GCS (Dronelink / DJI Fly) · Meshtastic app · Grafana mobile dashboard
NoteTouchscreen canvas function for site annotation on drone orthomosaics
Laptop 2 — Lenovo ThinkPad (rugged / portable)
Form factorDurable · keyboard-first · field utility
Primary useScripting · config · SSH to rack · light processing · field documentation
SyncSynced with Rack 1 via LAN/VPN
RF roleFallback GCS · terminal access to SDR node · Reticulum client
Phone — Samsung S25 Ultra
Primary useMobile EUD · drone GCS · Meshtastic / MeshCore app · Infiray thermal camera
MANET / SDRGoTenna MANET or equivalent for mobile mesh node capability evaluate options
Digital sketchpadS Pen annotation · site sketching · field photography
SyncContinuous with Rack 1 workstation and laptops
ReticulumSideband app (Android) — full Reticulum client over BT rnode
Mesh radio stack
MeshtasticPhase 0 edge nodes · SenseCAP P1-Pro · Heltec V3 drone relay active now
MeshCoreRegional backbone evaluation · 64-hop repeater model parallel experiment
ReticulumHub routing daemon (rnsd on SDR head) · long-term transport layer active now
OtherAREDN (licensed ham mesh · Phase 3) · LoRaWAN (sensor density option · Phase 2) evaluate later
05

Pi testbed — parallel track note

The Raspberry Pi 4 testbed is not part of the mini-rack schema — it is a parallel validation track. Pi #1 (MQTT/InfluxDB/Grafana/Reticulum) and Pi #2 (ODM/Potree/API ingest) are running the Phase 0 weather node pipeline right now and generating real data. That data and the lessons from that pipeline design feed directly into the mini-rack schema. When the mini-rack Rack 1 is deployed, the Pi testbed does not get replaced — the SDR head node (Node 4) absorbs its radio functions, and the Pi #2 ODM pipeline migrates to Rack 1 compute (Node 2). The Pis themselves become edge-only — firmware flashing, field-site sensor nodes, remote relay nodes where mains power isn't available.
Pi vs x86 for specific tasks: Pis remain the correct hardware for battery/solar-powered outdoor edge nodes (SenseCAP P1-Pro uses a similar ARM SoC). For anything involving parallel processing, GPU compute, simultaneous multi-stream ingest, or petabyte storage, x86 with proper network fabric is the only viable path. These are complementary, not competing.
Framework, not commitment. All four AI sources consulted independently converged on the three-plane separation. What is not settled is the specific implementation within each plane. Competing options documented side by side without picking winners.
01

Three-plane architecture

PLANE 1
CONTROL

Nervous system — always on, low bandwidth, safety-critical

Drone swarm kinematics, heartbeat, GPS sync, collision avoidance, metadata sidecars, sensor telemetry, MAVLink commands. Does NOT carry payload data. Must function independently of all other planes.

LoRa 915 MHzMeshtastic → MeshCore → Reticulum ~1–21 kbpsSolar autonomous
PLANE 2
INGEST

Bloodstream — burst data transfer, proximity-dependent

Raw sensor payload: imagery, point clouds, multispectral, thermal. Two competing models — not yet resolved:

Option A Physical store-and-forward
MethodDrone records to NVMe. Lands at dock. USB 3.2 / Thunderbolt transfer at 10–40 Gbps.
Bandwidth40–120 Gbps at dock
LatencyData at hub only after landing
InfrastructureDocking stations across site with power + NVMe
StatusPhase 0 equivalent — current approach
Option B Aerial burst (60 GHz / FSO)
MethodDrone hovers near relay tower. Dumps data via 60 GHz mmWave or FSO laser at 20–40 Gbps.
Bandwidth802.11ay: up to 100 Gbps theoretical
LatencyNear-real-time while drone still flying
Range limit60 GHz: ~50–200 m LOS only. Rain-sensitive.
StatusPhase 3 — requires relay tower infrastructure first
Hybrid is the likely long-term answer: physical offload for bulk archival, mmWave/FSO for priority live streams. But Option B requires relay towers. Build Option A first and validate the data model before committing to tower infrastructure.
PLANE 3
REFINERY

Brain — processing, storage, output generation

Receives all data from Planes 1 and 2. Fuses sensor streams. Produces digital twins, NDVI maps, point clouds, climate profiles, carbon baselines, ecological models. Fully local — no cloud.

This is Rack 1 (Node 2 compute + Node 3 NAS) expanding into Racks 2 and 3 as scale demands. See Mini-rack schema tab for node detail.

02

Physical backbone — options

OptionBandwidthSpanPhaseNotes
Cat6A / 10GbE copper10 Gbps100 m / runNowSimplest. Rack 1 internal fabric. Trenching needed for cross-site outdoor runs.
Single-mode fibre (SMF)100 Gbps+km-scalePhase 2+Fibre ring connecting relay towers across site. Immune to EM. Gemini/Grok recommendation for field scale.
60 GHz mmWave P2P1–10 Gbps200–500 m LOSPhase 2Wireless fibre substitute for tower spans. Rain-sensitive. Evaluate before fibre is laid.
Wi-Fi HaLow 802.11ah~8 Mbps1+ kmPhase 1–2Sub-GHz Wi-Fi. Better range than standard Wi-Fi. Useful for sensor-dense field areas. Not for drone data ingest.
FSO / laser1–10 Gbps500 m–2 kmPhase 3Licence-free optical wireless. Wind sway and fog are real problems. Tower-to-tower where fibre is impractical.
03

Edge triage — reduce before moving

Option A Full raw archival · process at hub
ModelRecord everything raw. Transfer to Rack 1/3. Process there.
StrengthNo data loss · re-processable with future algorithms
WeaknessPetabyte storage required faster
Phase fitPhase 0–2. Correct while algorithms are being validated.
Option B Edge triage · Jetson Orin on drone
ModelOnboard NVIDIA Jetson Orin performs first-pass classification. Discards sky, static areas, below-threshold data. Transfers only changed/relevant data.
HardwareJetson Orin NX · 100 TOPS · ~150–400 g · ~5–15 W
StrengthOrders of magnitude smaller transfer volume
WeaknessIrreversible data loss if triage model is wrong
Phase fitPhase 3+. Do not discard raw data until algorithms are validated on archival data first.
Metadata sidecars — bridge both options: Regardless of which path is chosen, the LoRa control plane should transmit a lightweight metadata sidecar per frame: timestamp, GPS, altitude, orientation quaternion, sensor mode, exposure. Small enough for LoRa. Enables georeferencing before bulk transfer. Design this into firmware from Phase 0.
01

Sensor payload — data volume reality check

LoRa is the wrong unit of analysis for data volume. LoRa handles the control plane. The data plane is several orders of magnitude larger and requires completely different infrastructure.
SensorRaw rate (est.)Per 30-min missionNotes
16K visible (200MP single sensor)~660 MB/frame~100–500 GBEven 1 fps = ~5 Gbps raw. Compressed AV1/H.266 ~10–50:1.
Full-sphere visible array~2–5 Gbps~450 GB–1.1 TBMultiple 16K tiles stitched. Sensor count TBD. exact array spec open
Dense LiDAR (128-line, 20 Hz)~1–10 Gbps raw~200 GB–2 TBDraco compression ~5–10:1. Point density and scan rate variable.
Multispectral (5–10 bands)~500 Mbps–2 Gbps~100–450 GBReduced per-band resolution. NDVI derivable in post.
Hyperspectral (200+ bands)~2–8 Gbps~450 GB–1.8 TBPhase 3+. Heavy classification compute required.
Thermal IR~100–500 Mbps~22–110 GBLower resolution than visible. Radiometric calibration required.
IMU / GPS / telemetry<1 Mbps<500 MBNegligible. Carried on LoRa control plane.
Single drone — full payload~2.5–20 Gbps~1–5 TB rawCompressed practical: ~100–500 GB per mission.
12-drone swarm~30–240 Gbps~12–60 TBSimultaneous ingest. Distributed storage required.
Annual (12 drones, daily ops)~4–22 PB / yearPetabyte accumulation timescale. Physical storage fabric, not radio.
LoRa implication: LoRa SF7 (fastest) = ~21 kbps. A single compressed 16K frame at 50 MB would take ~5 hours to transmit over LoRa. LoRa carries metadata, telemetry, and control signals only. This is the correct separation of concerns — not a limitation to fix.
02

Bandwidth ladder — control vs data planes

LoRa SF12 (Phase 0)
~1 kbps
LoRa SF7 (fastest)
~21 kbps
Wi-Fi HaLow 802.11ah
~8 Mbps
SDR OFDM (LimeSDR)
~30 Mbps
AREDN 802.11ac
~150 Mbps
Wi-Fi 6E (drone link)
~3 Gbps
60 GHz mmWave (802.11ay)
~40 Gbps
USB 3.2 / Thunderbolt (dock)
~20–40 Gbps
100 GbE fibre (Rack fabric)
100 Gbps
NVMe-oF / InfiniBand
200+ Gbps
The control plane (LoRa) and data plane (everything from Wi-Fi 6E up) are not on the same scale. The bandwidth ladder above is logarithmic in practice — there are nine orders of magnitude between LoRa and NVMe-oF.
The compute architecture is not decided. Two structurally different paths are documented below. The critical decision is the next hardware purchase — a single powerful machine commits to Path A; a mini PC as the first fabric node opens Path B. Both are valid. Neither forecloses the other permanently.
01

Path A — single powerful workstation

Workstation spec (representative)
CPUAMD Threadripper 7960X (24c) or Intel Core Ultra 9 285K
RAM128–256 GB DDR5 ECC
GPUNVIDIA RTX 4090 · 24 GB VRAM · ODM + Nerfstudio + UE5 + COLMAP
NVMe2–4 TB (active working set)
HDD array48–200 TB ZFS (long-term archival)
Network10GbE NIC → managed switch
Est. cost~$5,000–$12,000 depending on storage
Path A — assessment
StrengthsSimple · low admin overhead · single maintenance point · works today
CeilingHits wall at multi-drone simultaneous ingest + real-time processing
Breaks atSwarm scale — 5+ drones simultaneous ODM saturates single CPU
Upgrade pathSecond machine creates ad-hoc cluster — not designed, just bolted on
Pi integrationPis demoted to edge-only. Workstation absorbs all Pi data roles.
02

Path B — distributed mini-rack fabric

Fabric node roles (Rack 1)
Node A — orchestrationLenovo Tiny M90q · 16c · 64 GB · Proxmox / K3s · Mosquitto · InfluxDB
Node B — storageMini PC + SATA bays · TrueNAS · ZFS RAIDZ2 · NVMe cache
Node C — computeMini PC · 16c · 64 GB · ODM · COLMAP · PDAL · no GPU initially
Node D — RF/SDRPi 5 or mini PC · RTL-SDR · rtl_433 · Direwolf · Reticulum rnode
GPU server (add later)2U server or eGPU enclosure · RTX 4090 · triggers when Nerfstudio / UE5 needed
Path B — assessment
StrengthsHorizontal scaling · role isolation · no rebuild on upgrade · maps to HPC
CeilingDesigned not to have one — add nodes to Rack 2 via Kubernetes
Admin overheadHigher than Path A — Kubernetes, Ceph, and fabric require more ops knowledge
Initial costHigher upfront for equivalent single-node performance
Est. total cost~$5,500–$8,000 for full Rack 1 (Grok estimate)
Pi integrationPis remain as edge nodes — they are Node D class devices in the fabric model
GPT: "This is a micro data center, not a PC build." The fabric model is what allows the testbed to evolve into swarm-scale operations without architectural resets. Path A is faster to deploy. Path B is what you build if you know the destination.
03

GPU — when and what

Trigger eventGPU requirement
UE5 + Cesium digital twinRTX 4080/4090 · 16–24 GB VRAM · cannot run on CPU alone
Gaussian splatting (Nerfstudio / Postshot)RTX 3090 minimum · 4090 recommended · VRAM-limited
COLMAP dense reconstructionAny CUDA GPU accelerates significantly · 8+ GB VRAM
Real-time NDVI / hyperspectral classificationJetson Orin (edge drone) or A100/H100 (hub) for inference
Multi-drone simultaneous ODMMulti-GPU or cluster — single RTX 4090 saturates at ~5 simultaneous missions
On-drone edge triageNVIDIA Jetson Orin NX (100 TOPS) · separate from hub GPU
The existing Pi 4s are not replaced — they are repurposed. Every new purchase should either be a direct upgrade to the data plane or the first node of the distributed fabric.
0
Current — two Pi 4s, home network, Phase 0 testbed
Pi #1: MQTT + InfluxDB + Grafana + Reticulum + RTL-SDR. Pi #2: ODM + Potree + API ingest. Home Wi-Fi. microSD storage. One drone. Data ceiling ~200 GB. This is the validation track — generating real data that informs every subsequent hardware decision.
1
First upgrade — managed 10GbE switch (both paths require this)
MikroTik CRS312 or equivalent managed 10GbE switch (~$300–$500). This is the single purchase that benefits both Path A and Path B equally. It separates drone burst transfer traffic from sensor MQTT traffic with QoS policies. All future nodes connect here. Upgrade the switch before buying any compute.
2
First storage node — NAS (both paths require this)
TrueNAS Scale on a mini PC with 4–8 HDDs in ZFS RAIDZ2, or a dedicated NAS appliance. Drone imagery moves here from microSD. Pi #2's ODM scratch space moves here. This prevents "mission lost when card fills" incidents and creates the shared storage that all future compute nodes will read from.
3A
Path A fork — single workstation purchase
Buy Threadripper or Core Ultra 9 workstation with RTX 4090. ODM, Nerfstudio, UE5, COLMAP all move here. Pi #2 becomes API ingest + Potree serving only. This is fast, powerful, and adequate for single-drone operations and small swarms.
3B
Path B fork — first fabric compute node
Buy first Lenovo Tiny or equivalent mini PC as the orchestration + processing node. Install Proxmox or K3s. ODM moves here from Pi #2. Leave GPU slot open — add eGPU or GPU server as a separate fabric node when Nerfstudio is needed. This is slower per-task than Path A but every subsequent purchase adds capacity rather than complexity.
4
GPU node — triggered by Gaussian splats or UE5
When Nerfstudio or UE5+Cesium enters the pipeline: Path A already has it in the workstation. Path B adds a dedicated GPU server (2U or eGPU enclosure) as a new fabric node. RTX 4090 at current prices handles both Nerfstudio and UE5 adequately for single-site operations.
5
Multi-drone scale — Rack 2 activation
When 3+ drones operate simultaneously and Rack 1 compute saturates: Path B adds worker nodes to Rack 2, Kubernetes schedules across them automatically, Ceph replaces ZFS for distributed storage. Path A at this point requires a second machine and an ad-hoc cluster arrangement — this is where Path A's ceiling becomes concrete and Path B's design advantage becomes tangible.