多模态分析框架.md 8.8 KB

多模态分析框架

版本号:v0.1.0 最后更新:2026-04-04

说明:本版为按规范整理的历史文档,正文暂保留原英文内容。

1. Purpose

This document defines the higher-level framework architecture above the portable signal analyzer design.

The portable signal analyzer should be treated as one product instantiation of a broader platform:

  • one framework
  • many input modalities
  • many algorithm families
  • one evidence model
  • one AI orchestration layer

The long-term goal is to build a general analysis framework that can ingest different kinds of observations and drive deterministic algorithm pipelines with AI assistance.

2. Core Thesis

The framework should not be "AI analyzes everything directly".

It should be:

  • input-normalized
  • algorithm-driven
  • evidence-centered
  • AI-orchestrated

In practice:

  • deterministic algorithms do sensing, extraction, decoding, and measurement
  • AI selects algorithms, adjusts parameters, compares outcomes, and explains results
  • decisions are based on structured evidence rather than free-form intuition

3. Framework Scope

The framework should be able to support multiple modalities over time, such as:

  • wireless and RF signals
  • audio
  • optical/light signals
  • video
  • industrial buses
  • telemetry and logs

Not every modality needs to be implemented at the beginning. The architecture should support them without hardcoding a single domain.

4. Architectural Principle

The platform should be designed around five stable abstractions:

  1. InputSource
  2. Observation
  3. AlgorithmModule
  4. Evidence
  5. Experiment

If these abstractions are stable, the framework can grow without becoming a pile of special cases.

5. Input Model

5.1 InputSource

An input source is where data comes from.

Examples:

  • SDR receiver
  • microphone
  • camera
  • photodiode
  • logic analyzer
  • CAN/UART bridge
  • file replay

Suggested shape:

type InputSource = {
  id: string
  kind: "rf" | "audio" | "video" | "optical" | "bus" | "file" | "log"
  capabilities: string[]
  metadata: Record<string, unknown>
}

5.2 Observation

An observation is a concrete chunk of captured data plus context.

Suggested shape:

type Observation = {
  id: string
  sourceId: string
  modality: string
  timeRange: { start: string; end: string }
  payloadRef: string
  metadata: Record<string, unknown>
}

This abstraction is important because all downstream algorithms should consume observations, not raw device details.

6. Algorithm Layer

6.1 AlgorithmModule

An algorithm module is a reusable processing block.

Examples:

  • FFT analysis
  • burst detector
  • frequency estimator
  • demodulator
  • object detector
  • OCR stage
  • audio event detector
  • frame decoder
  • protocol parser

Suggested shape:

type AlgorithmModule = {
  name: string
  modality: string[]
  stage: string
  inputFormat: string
  outputFormat: string
  params: Record<string, ParamSpec>
  metrics: string[]
  constraints: string[]
  cost: { cpu: number; memory: number; latency: number }
}

6.2 Algorithm Chains

The framework should not assume a single fixed pipeline. Instead, it should support candidate chains such as:

  • RF: preprocess -> detect -> sync -> demod -> decode -> frame -> protocol
  • Audio: denoise -> segment -> feature extract -> classify -> event correlate
  • Video: sample -> detect -> track -> OCR -> temporal reasoning
  • Optical: capture -> normalize -> pulse detect -> decode -> classify

This is the main reason a framework approach is stronger than a single product implementation.

7. Evidence Model

The system should never rely on a single opaque output. It should accumulate evidence.

Suggested shape:

type Evidence = {
  id: string
  observationId: string
  producer: string
  category: "signal" | "frame" | "protocol" | "semantic" | "anomaly"
  values: Record<string, unknown>
  confidence: number
  traceRefs: string[]
}

Examples of evidence:

  • detected carrier at a specific frequency
  • stable symbol timing candidate
  • valid CRC frames
  • recognized frame header
  • identified spoken keyword
  • recognized text in a video frame
  • repeated anomaly pattern

The framework should make evidence first-class because AI needs something structured to reason over.

8. Experiment Model

The framework should be built around experiments rather than one-pass processing.

Suggested shape:

type Experiment = {
  id: string
  observationId: string
  pipeline: PipelineNode[]
  outputs: string[]
  evidenceIds: string[]
  score: ScoreCard
  status: "pending" | "running" | "done" | "failed"
  failureReasons: string[]
}

This allows:

  • controlled iteration
  • branching search
  • reproducibility
  • replay
  • regression

9. AI Orchestration Layer

The AI layer is an orchestrator, not the raw algorithm engine.

Primary responsibilities:

  • choose candidate modules
  • assemble candidate pipelines
  • tune parameters
  • compare experiment results
  • request more data if needed
  • summarize confidence and uncertainty
  • explain likely next actions

AI should make decisions based on:

  • observation metadata
  • extracted features
  • evidence history
  • prior successful cases
  • computational budget

10. Decision Loop

The framework should support a closed-loop controller:

  1. ingest observation
  2. extract cheap first-pass features
  3. generate candidate pipelines
  4. run bounded experiments
  5. score outputs
  6. accumulate evidence
  7. prune or mutate pipelines
  8. produce ranked hypotheses
  9. request more data or terminate

This applies across modalities, even when the underlying algorithms differ.

11. Scoring

The scoring layer must be framework-wide, even if metrics differ by modality.

Every score should combine:

  • correctness indicators
  • structural consistency
  • confidence quality
  • computational cost
  • stability under small perturbations

High-level score groups:

  • signal quality score
  • structure extraction score
  • semantic interpretation score
  • anomaly relevance score
  • resource penalty

This is one of the most important shared services in the framework.

12. Why This Can Become a Platform

The framework can evolve into more than a single device because the hard problem is not one algorithm. The hard problem is:

  • normalizing many inputs
  • managing many algorithm modules
  • assembling valid chains
  • evaluating evidence quality
  • automating search and comparison

That creates reusable platform assets:

  • algorithm registry
  • pipeline runtime
  • experiment store
  • scoring engine
  • evidence graph
  • AI orchestration policies

These assets can later support:

  • handheld products
  • desktop tools
  • edge devices
  • cloud-assisted analysis
  • domain-specific SDKs

13. Product Strategy Implication

The framework should be platformized internally but commercialized through narrow vertical products first.

Good strategy:

  • design the architecture as a general framework
  • launch with one constrained, high-value modality
  • accumulate reusable modules and scoring logic
  • gradually expose the platform capability later

Bad strategy:

  • start by marketing a universal tricorder for everything
  • build too many modalities before proving one workflow

Platform-first architecture is good. Platform-first go-to-market is risky.

14. Relationship to the Portable Signal Analyzer

The portable signal analyzer document is a domain-specific specialization of this framework.

Mapping:

  • framework input model -> handheld device capture sources
  • framework algorithm modules -> DSP, demodulation, framing, protocol stages
  • framework evidence model -> signal/frame/protocol evidence
  • framework AI orchestration -> pipeline selection and parameter tuning on-device or via host

In other words:

  • 多模态分析框架.md defines the parent architecture
  • 便携式信号分析仪架构说明.md defines one concrete product path

15. Recommended Immediate Starting Point

Do not start by implementing every modality. Start by building the shared framework primitives in a narrow domain.

Recommended first steps:

  1. choose one primary modality
  2. define InputSource, Observation, AlgorithmModule, Evidence, and Experiment
  3. build a module registry
  4. build an experiment runner
  5. build a scoring engine
  6. add a minimal AI orchestration loop

For this project, the strongest first modality is still RF / signal analysis, because:

  • it aligns with the current handheld analyzer concept
  • deterministic pipeline boundaries are clearer
  • Wireshark-related protocol tooling can be reused after framing
  • it gives a disciplined environment for validating the framework model

16. Summary

The right long-term direction is a multimodal analysis framework, not just a single-purpose signal tool.

But the right execution order is:

  • framework-shaped architecture
  • narrow first modality
  • deterministic algorithm modules
  • evidence-first design
  • AI as orchestration and explanation

That combination gives both technical depth and a plausible product path.