Why OpenTelemetry Doesn't Fit Our World — And What We Build Instead

by Markus Maiwald

Why OpenTelemetry Doesn’t Fit Our World — And What We Build Instead

We’re not building an OTel rival. We’re building what OTel can never be.


The Philosophical Split

OpenTelemetry was designed for a world where you observe everything, store everything, visualize everything; and hope someone notices the anomaly. It is a surveillance architecture for your own infrastructure. Passive. Exhaustive. Expensive.

What we build is the inverse: an immune system. Active. Selective. Autonomous. Our Membrane Agent doesn’t wait for some engineer to squint at a Grafana panel at 3am with coffee-stained eyes and a dying will to live. It detects the pathogen and responds. Logs exist for the forensic autopsy, not for the live show.

OTel asks: “What happened everywhere, all the time?” LTP asks: “What went wrong, why, and what did we do about it?”

These are fundamentally different protocols because they serve fundamentally different epistemologies. One assumes you have a team of SREs watching dashboards in shifts. The other assumes your infrastructure is a Raspberry Pi on a solar panel in Mombasa and nobody is watching anything. One of these assumptions maps to reality. The other maps to a Series B pitch deck.


1. What LTP Is Missing — The Honest Gap Analysis

Let’s be surgical about what OTel actually provides that we don’t have yet. No cope. No hand-waving.

Distributed Tracing. This is the single OTel feature with genuine engineering value. A request flows through Node A → Relay B → Agent C; you need to reconstruct the causal chain. OTel uses trace_id (16 bytes) + span_id (8 bytes) + parent_span_id (8 bytes), propagated via W3C TraceContext headers.

Here’s the thing: we already have the primitives. The LWF header carries session_id (16 bytes) and sequence (4 bytes). The ns-msg envelope carries lamport_clock (8 bytes) and publisher_did. What’s missing is an explicit trace context that links causally related messages across DID boundaries. That’s ~32 bytes of metadata in the ns-msg envelope. Not a new protocol. Not a new layer. Thirty-two bytes.

Structured Metrics Aggregation. LTP publishes raw sensor values: f64 watts. OTel provides counters (monotonic sum), gauges (point-in-time), histograms (distribution), and delta vs. cumulative temporality. For a solar panel publishing watts every 30 seconds, raw gauges are fine. For a relay processing 10,000 messages per second, you want delta counters and exponential histograms to compress that into meaningful statistics.

We need this; but not at the transport layer. It belongs in the Janus runtime as a metrics module that publishes aggregated data onto ns-msg topics. The wire format stays identical. The aggregation logic lives at the application level where it belongs.

Semantic Conventions. OTel’s http.request.method, db.system, k8s.pod.name standardized naming is genuinely useful for cross-system correlation. Our $LTP/ocean/{region}/{site}/solar/power hierarchy is the same idea; but currently ad-hoc. This is documentation work. Not protocol work. Define $LTP/semconv/ with standard attribute names. Ship it as a Janus library. Done.

OTLP Export Bridge. The entire Grafana/Prometheus/Datadog ecosystem speaks OTLP. If LTP can’t export to these backends, it’s an island. And islands don’t win protocol wars.

This is a gateway component. Not a protocol change. A janus-otlp-bridge that subscribes to $LTP/** topics and translates to OTLP/gRPC. Runs on the Chapter relay or any node with enough resources. About 5KB of translation logic. The bridge speaks their language so the rest of our stack never has to.

What we DON’T need from OTel:

The Collector pipeline (Receiver → Processor → Exporter) is an architectural confession. It admits the protocol can’t process its own data. Our Membrane Agent already is the processor; it just needs a richer pattern vocabulary.

The Batch Processor exists because OTLP is too expensive per message. Our LCC keepalives are 16 bytes.

The Memory Limiter exists because the Collector is a Go binary with garbage collection pauses that spike RAM unpredictably. Zig doesn’t have this problem. Zig doesn’t have a garbage collector. Because Zig respects the machine it runs on.

OTel’s Collector is 128 MiB of “we couldn’t make the protocol efficient enough, so here’s a middleman.” We don’t need a middleman. We need a protocol that works.


2. Architecture: Not a Layer On Top. A Profile Within.

Same answer as Feed Social. Same answer as the original LTP design. The pattern repeats because the pattern is correct:

ns-msg = the universal messaging primitive

LTP Telemetry Profile = ns-msg + $LTP/telemetry/*
LTP Sensor Profile = ns-msg + $LTP/ocean/* (what we have)
LTP Observability Profile = ns-msg + $LTP/obs/* (what we're adding)
Feed Social = ns-msg + encryption tiers + social features

The Observability Profile adds three namespace families:

$LTP/obs/{scope}/{service_dtrace_id}/{spanid}/trace/{_id}
$LTP/obs/{scope}/{service_did}/metric/{metric_name}
$LTP/obs/{scope}/{service_did}/event/{severity}/{event_type}

Same wire format. Same authentication. Same encryption. Same Membrane filtering. Zero new service types. It’s ns-msg publishes with a semantic convention for observability data. That’s it.

Here’s why this is architecturally superior to what OTel does: OTel maintains three separate pipelines for traces, metrics, and logs. Three protobuf schemas. Three processing chains. Three export paths. They even built a spanmetrics connector — a component whose sole purpose is to bridge traces and metrics because the architecture physically cannot see them as the same thing.

We have one namespace with typed paths. A trace span and a sensor reading and a security alert all flow through the same router, the same interest tables, the same encryption layer. The Membrane Agent sees all of them in one unified stream.

OTel built three highways. We built one river. The river carries everything. The highways require three toll booths.


3. The Real Product: Semantic Events + Membrane Agents

This is where it gets interesting. This is what the OTel committee; with its 500+ contributors and design-by-committee architecture; completely failed to see.

Nobody looks at dashboards. Nobody. Ever. In the history of infrastructure operations, the dashboard has been open exactly when the CEO walks by and exactly when the post-mortem meeting starts. The rest of the time? It’s a screensaver that costs you $2,000/month in Datadog licenses.

We look at logs when something breaks. We search for the error after the customer screams. We trace the request after the alert fires. The entire workflow is forensic. So why does the entire observability industry optimize for continuous real-time metric collection?

Because that’s what sells enterprise contracts. Not because that’s what engineers actually need.

The right architecture optimizes for forensics.

In OTel, a trace span and a log record and a metric data point are three different protobuf messages with three different schemas sent through three different pipelines.

In LTP, they’re all SovereignEvent with different domain and event_type values. One schema. One pipeline. One namespace. One Membrane watching all of it.

struct SovereignEvent {
    event_id: [16]u8              // Unique event ID
    publisher_did: DID            // Who emitted this
    trace_id: ?[16]u8            // Causal chain (optional)
    parent_id: ?[8]u8             // Parent in chain (optional)
    timestamp: SovereignTimestamp
    lamport: u64                 // Causal ordering
    severity: Severity           // TRACE|DEBUG|INFO|WARN|ERROR|FATAL
    domain: Domain               // SECURITY|PERFORMANCE|BUSINESS|SYSTEM
    event_type: []const u8       // "session.timeout", "entropy.spike"
    attributes: CBOR             // Structured key-value
    body: ?CBOR                  // Optional rich body
    confidence: ?f32              // Membrane confidence score
    action_taken: ?PatternResponse // What the Membrane did about it
}

Look at the last two fields. confidence and action_taken. These don’t exist in OpenTelemetry. They can’t exist in OpenTelemetry. Because OTel is a collection protocol. It collects data and ships it somewhere else for someone else to think about.

Our events carry the verdict and the sentence. The Membrane detected the anomaly at 0.92 confidence and blocked the source for 600 seconds. That information is in the event itself. The forensic record is complete the moment it’s written.

The Membrane Agent as SIEM Engine:

Traditional SIEM:
App → logs → network → Splunk → rules → alert → human → response
Latency: minutes to hours. Cost: $50K+/year.

Libertaria SIEM:
Node → SovereignEvent → Membrane → pattern detection → auto-response
Latency: milliseconds. Cost: 0. Sovereign.

The Membrane Agent (RFC-0110 + RFC-0115) already does:

  • Real-time pattern recognition across temporal windows
  • Confidence scoring with graduated response
  • Autonomous countermeasures — block, rate-limit, escalate
  • Anomaly forwarding to the Cognitive Layer for novel attack patterns

What it needs to become a full SIEM:

Event Correlation. Link SovereignEvents by trace_id to reconstruct causal chains. “Session timeout on Node A → retry storm on Node B → OOM kill on Node C” stops being three unrelated log entries in three different tools. It becomes one incident with a forensic trail that tells the complete story.

Retention Policy. Events publish to ns-msg topics with TTL. Security events retain longer than debug events. Archive Nodes (RFC-0850) store the forensic record. The architecture for this already exists; it just needs the semantic conventions to know what matters and what doesn’t.

Query Interface. The ns-msg SURVEY pattern against archived events. “Show me all ERROR events from DID xyz in the last 6 hours with trace_id correlation.” Not a proprietary query language. Not PromQL. Not LogQL. The same pub/sub pattern the entire stack already speaks.

Exporters. janus-otlp-bridge for Grafana. janus-syslog-bridge for legacy SIEM. janus-prometheus-bridge for metrics. These are gateway agents, not protocol changes. Translation happens at the boundary. The sovereign core stays clean.


4. Transport: Don’t Add Complexity. Bridge When Necessary.

Internal (Libertaria ↔ Libertaria): UTCP + LCC + ns-msg over LWF frames. Period.

This is our sovereign transport. Encrypted. Authenticated. DID-native. Kenya-compliant. 50KB binary.

Adding gRPC or protobuf internally would be architectural pollution of the worst kind; the kind that starts with “let’s just add compatibility” and ends with your protocol being a wrapper around someone else’s protocol.

We use CBOR because it’s self-describing, compact, and doesn’t require a schema compiler. Protobuf requires protoc and a build pipeline. CBOR just works. On an ESP32. On a Raspberry Pi. On a server. Everywhere. Without generating code from .proto files and praying the versions match.

External (Libertaria → Legacy World): Bridge agents at the Chapter relay.

┌──────────────────────────────────────────────────────┐
│ Sovereign Domain (LWF + UTCP + ns-msg)               │
│                                                      │
│   Node A ──► Relay ──► Node B                        │
│                │                                      │
│                └──► Bridge Agent                      │
│                     subscribes: $LTP/obs/**           │
│                     exports:                          │
│                       OTLP/gRPC → Grafana            │
│                       Prom remote → VictoriaMetrics  │
│                       Syslog/CEF → legacy SIEM       │
│                       Webhook → PagerDuty             │
└──────────────────────────────────────────────────────┘

The bridge agent is a Janus :service running on hardware with enough resources. It translates SovereignEvent → OTLP protobuf and pushes via gRPC to whatever backend the Chapter operator chose.

The sovereign domain never speaks protobuf. The bridge speaks it on behalf of the domain. One-way. Outbound only. The legacy world gets what it needs. The sovereign core gives nothing up.


The Strategic Summary

DecisionAnswerWhy
Build an OTel rival?No. Build what OTel can never be.OTel is cloud-native Kubernetes observability. We’re sovereign IoT + SIEM. Different species.
New protocol layer?No. ns-msg Observability Profile.Same wire format. New namespace conventions. $LTP/obs/*
Core primitive?SovereignEvent (unified schema)One type for traces, metrics, logs, alerts. Not three pipelines.
Detection engine?The Membrane Agent IS the SIEMAlready has pattern detection + autonomous response.
Grafana / Prometheus?Bridge agent at the Chapter relayExport via OTLP/gRPC. Sovereign → legacy. One-way.
Transport changes?None internally. OTLP bridge externally.UTCP + LCC + ns-msg stays pure. The bridge speaks gRPC.
Protobuf / gRPC?Only in the bridge agentInternal: CBOR. External: whatever the legacy world demands.

What This Means — Concretely

Six things need to happen. Not twelve. Not a roadmap that stretches to 2028. Six.

  1. Define SovereignEvent in Janus — the unified observability primitive. One struct to replace three OTel schemas.

  2. Define $LTP/obs/* namespace conventions — the semantic vocabulary. Documentation; not protocol surgery.

  3. Add trace_id/span_id propagation to the ns-msg envelope — 32 bytes. Trivial.

  4. Extend the Membrane Agent pattern vocabulary for observability events. Teach it to correlate, not just filter.

  5. Write janus-otlp-bridge — translates SovereignEvent → OTLP protobuf. About 2,000 lines of gateway logic.

  6. Ship the Two-Raspberry-Pi Demo — solar telemetry + Membrane alerts + Grafana dashboard via bridge. The proof that kills the argument.


The Line That Ends The Conversation

“Your Collector needs 128 MiB of RAM. Our Membrane Agent needs 50 KB — and it doesn’t just collect your telemetry. It defends your node while doing it.”

OpenTelemetry is a fine protocol for a world of Kubernetes clusters, unlimited RAM, and SRE teams that rotate in 8-hour shifts. That world exists. It’s just not our world.

Our world runs on solar panels and intermittent 4G. Our world has no SRE team; the infrastructure is the team. Our world doesn’t need a surveillance camera pointed at every process. It needs an immune system that fights without being told to fight.

OTel watches. The Membrane acts. That’s the difference. That’s the whole difference.

And if someone still wants the Grafana dashboard? The bridge exports it. Same pretty graphs. Same drill-down. Same vendor compatibility. The only thing that changes is this: behind the dashboard, something is actually doing something about the problems it displays.

That’s not an OTel rival. That’s an OTel replacement for everyone who got tired of staring at red lines on a screen and wishing the screen would fix them itself.


The immune system doesn’t need a dashboard. It needs teeth.