便携式信号分析仪架构说明.md 13 KB

便携式信号分析仪架构说明

版本号:v0.1.0 最后更新:2026-04-04

说明:本版为按规范整理的历史文档,正文暂保留原英文内容。

This document is a domain-specific product architecture under the higher-level framework design in 多模态分析框架.md.

1. Purpose

This document captures a product and architecture direction for a portable signal analyzer inspired by the "tricorder" style workflow:

  • collect signals from multiple physical sources
  • detect, demodulate, decode, and frame them using deterministic algorithms
  • apply protocol analysis on framed data
  • use AI as an assistant, orchestrator, and experiment controller

The core principle is:

AI should not replace the actual signal-processing and protocol-analysis engine. Deterministic algorithms should do the decoding work. AI should control experiments, compare results, explain outcomes, and choose the next action.

2. Product Positioning

The target device is not just a packet sniffer and not just an SDR receiver. It is a multi-stage analysis platform for:

  • RF and non-RF signal acquisition
  • physical-layer and link-layer recovery
  • protocol identification and interpretation
  • guided diagnostics and anomaly explanation

Practical examples include:

  • identifying unknown digital bursts
  • recovering framed traffic from noisy captures
  • decoding standard or proprietary protocols
  • presenting operator-friendly summaries and next-step suggestions

3. Wireshark Reuse Boundary

Wireshark is useful, but only for the upper half of the stack.

Wireshark is strong at:

  • framed packet dissection
  • protocol tree generation
  • display filtering
  • reassembly, statistics, and follow-stream style analysis
  • export and structured protocol interpretation

Wireshark is not the right tool for:

  • raw RF analysis
  • blind modulation recognition
  • carrier recovery
  • symbol timing recovery
  • unknown physical-layer reconstruction

The practical reuse boundary is:

  1. collect raw signal data
  2. perform DSP, demodulation, bit recovery, and framing
  3. convert recovered traffic into packets or events
  4. hand those results to Wireshark-related tooling such as:
    • pcap / pcapng
    • tshark
    • sharkd
    • custom dissectors where appropriate

This makes Wireshark a protocol-analysis backend, not the full analyzer brain.

4. System Architecture

The system should be split into clear layers.

4.1 Acquisition Layer

Inputs may include:

  • IQ streams
  • IF or audio data
  • logic-level captures
  • UART / SPI / I2C / CAN buses
  • BLE / Wi-Fi / Ethernet mirrored traffic
  • file-based replay samples

This layer should normalize access to multiple hardware front-ends and record:

  • timestamp
  • sample rate
  • center frequency
  • gain / front-end state
  • source identity
  • capture duration

4.2 Signal Workspace

The workspace is the canonical store for both raw and intermediate data. It should retain:

  • raw samples
  • derived features
  • intermediate bitstreams
  • framed outputs
  • experiment metadata
  • scores and failure reasons

This is essential for reproducibility, offline replay, regression testing, and AI-driven iteration.

4.3 DSP / Demodulation Pipeline

This layer performs actual signal recovery. Typical module categories:

  • preprocessing
    • DC removal
    • AGC
    • filtering
    • resampling
  • detection
    • energy detection
    • burst detection
    • coarse frequency estimation
  • synchronization
    • carrier recovery
    • symbol clock recovery
    • preamble / sync-word search
  • demodulation
    • OOK / ASK
    • FSK / GFSK
    • PSK / QPSK
    • OFDM-family or chirp-style paths when supported
  • bit-domain processing
    • hard or soft decision
    • de-whitening
    • de-interleaving
    • FEC decoding
    • CRC validation

4.4 Frame Builder

This layer transforms bitstreams into candidate frames or packets. It is the boundary between "signal recovery" and "protocol interpretation".

Responsibilities:

  • frame boundary detection
  • fixed/variable length frame assembly
  • checksum / CRC validation
  • field boundary estimation
  • event extraction

4.5 Protocol Analysis Layer

Once data has become packets or events:

  • use Wireshark-compatible outputs for standard protocols
  • use internal parsers and heuristics for proprietary protocols
  • gradually migrate stable proprietary formats into custom dissectors if needed

4.6 UI and Assistant Layer

This layer provides:

  • live scan results
  • replay and lab analysis
  • confidence-ranked protocol candidates
  • anomaly explanations
  • next-step recommendations
  • exportable reports

5. AI Role in the System

AI should not directly replace DSP blocks. Its primary role is that of an orchestration and analysis controller.

AI responsibilities:

  • choose candidate algorithm pipelines
  • tune parameters
  • compare candidate outputs
  • explain likely failure points
  • decide what to try next
  • summarize results for the operator

AI should behave like an automated signal-analysis engineer, not like a magical decoder.

6. AI Search for Algorithm Chains

6.1 Problem Definition

Algorithm-chain search is a constrained program-search problem.

Input:

  • raw or partially processed signal data
  • prior device/context metadata
  • previous experiment history

Output:

  • a ranked set of candidate pipelines
  • associated parameter settings
  • score and confidence estimates

Optimization target:

  • maximize correctness and interpretability
  • minimize computational cost and false positives

6.2 Canonical Pipeline Shape

A pipeline can be modeled as:

source
-> preprocess
-> detect
-> sync
-> demod
-> decode
-> frame
-> proto

Each stage may have several interchangeable modules.

6.3 Search State

Each attempt should be tracked as an experiment node.

type Experiment = {
  id: string
  parentId?: string
  pipeline: PipelineNode[]
  inputRef: string
  outputs: StageOutput[]
  score: ScoreCard
  status: "pending" | "running" | "done" | "failed"
  notes?: string
}

type PipelineNode = {
  module: string
  params: Record<string, number | string | boolean>
}

This allows AI to operate over an experiment tree instead of producing one-off guesses.

6.4 Recommended Search Strategy

Use a hybrid strategy:

  • rules for initialization
  • beam search for structure search
  • local optimization for parameter tuning

Recommended control flow:

  1. classify signal at a coarse level
  2. generate a small number of high-probability candidate pipelines
  3. run short-window experiments
  4. score results and prune aggressively
  5. mutate the best pipelines locally
  6. rerun on longer samples for confirmation

This is more stable than unconstrained random search.

6.5 Why Beam Search Fits

Beam search is a strong fit because it:

  • is resource-bounded
  • is easy to explain and debug
  • supports progressive refinement
  • works well with ranked experiment history

Suggested pattern:

  • outer loop: beam search over module-chain structure
  • inner loop: bounded parameter tuning around the best chains

6.6 Parameter Tuning

Continuous or range-based parameters should be tuned separately from structure search. Typical tunables:

  • symbol rate
  • bandwidth
  • threshold values
  • timing recovery parameters
  • frequency offset compensation
  • framing tolerances

Possible strategies:

  • bounded grid search
  • adaptive range narrowing
  • Bayesian optimization where available

7. Scoring System

The scoring system is the backbone of the AI loop. AI can only optimize what is measured.

7.1 Physical-Layer Score

Examples:

  • SNR improvement
  • carrier lock stability
  • clock recovery stability
  • cluster separation after demodulation
  • residual frequency error

7.2 Frame-Level Score

Examples:

  • preamble detection rate
  • frame length consistency
  • frame-boundary stability
  • CRC pass rate
  • repeated structure frequency

7.3 Protocol-Level Score

Examples:

  • known-header matches
  • valid field lengths
  • legal enum / field value ratios
  • session consistency
  • successful Wireshark-style protocol interpretation

7.4 Cost Penalty

Examples:

  • CPU cost
  • memory cost
  • latency
  • fragility under small parameter changes
  • overfitting to short windows

7.5 Example Composite Score

score =
  0.25 * phy_score +
  0.35 * frame_score +
  0.30 * proto_score -
  0.10 * cost_penalty

Weights should initially be hand-tuned and later adjusted using replay corpora.

8. Failure Attribution

Every experiment should return structured failure reasons.

Example labels:

  • no_signal_detected
  • unstable_symbol_clock
  • carrier_not_locked
  • frame_sync_failed
  • crc_failed
  • field_semantics_invalid
  • overfit_to_noise

This enables targeted next-step decisions.

Examples:

  • unstable_symbol_clock -> adjust symbol-rate range or swap timing recovery module
  • crc_failed -> try bit inversion, whitening, byte order, CRC family changes
  • field_semantics_invalid -> reconsider framing or protocol family

9. Module Registry

Every algorithmic building block should be registered with machine-readable metadata.

type ModuleSpec = {
  name: string
  stage: "preprocess" | "sync" | "demod" | "decode" | "frame" | "proto"
  inputFormat: string
  outputFormat: string
  params: Record<string, ParamSpec>
  constraints: string[]
  metrics: string[]
  cost: { cpu: number; memory: number; latency: number }
}

Without a registry, AI cannot safely orchestrate pipelines.

The registry should allow the system to answer:

  • what can run after what
  • what parameters are tunable
  • what metrics each module produces
  • which modules are expensive
  • which modules are suitable for real-time use

10. Knowledge Base and Priors

The system should maintain a history of prior successful analyses.

type PriorCase = {
  featureFingerprint: number[]
  successfulPipelines: RankedPipeline[]
}

Benefits:

  • faster startup on familiar signal families
  • reduced search cost
  • improved reliability over time
  • operator trust through precedent-based suggestions

This lets the AI behave more like an experienced lab engineer.

11. Runtime Modes

At minimum, the product should support:

11.1 Live Scan

  • real-time acquisition
  • limited local search
  • fast confidence-ranked hints
  • real-time alerting

11.2 Lab Replay

  • deterministic offline reprocessing
  • multiple experiment branches
  • parameter tuning
  • regression validation

11.3 Protocol Assist

  • packet/event summarization
  • protocol explanation
  • filter and query generation
  • reporting and export

12. Device vs Host Split

A portable device has strict CPU, memory, thermal, and battery limits. Do not assume the full AI search workload belongs on-device.

Recommended split:

  • device side
    • acquisition
    • lightweight feature extraction
    • small bounded search
    • fast heuristic alerts
  • host / dock / edge side
    • deep experiment search
    • heavy replay analysis
    • larger AI inference
    • training / rule generation

This split keeps the handheld usable under real operating conditions.

13. Safety Boundaries for AI

AI should be allowed to:

  • select pipelines
  • adjust parameters
  • reorder compatible modules
  • choose which experiment to run next
  • generate summaries

AI should not directly and automatically:

  • patch low-level production DSP code in the live path
  • disable safety limits
  • bypass deterministic validation
  • replace scoring with free-form judgment

If AI-generated changes extend beyond parameter or policy updates, they should be validated in replay or sandbox mode first.

14. MVP Implementation Plan

A practical first version should be intentionally small.

14.1 MVP Scope

  • one or two acquisition sources
  • 10 to 20 reusable modules
  • experiment manager
  • scoring engine
  • beam-search controller
  • replay dataset support
  • export to pcap / pcapng
  • protocol analysis through tshark or sharkd

14.2 Suggested Initial Modules

  • dc_remove
  • agc
  • bandpass
  • resample
  • burst_detect
  • freq_offset_est
  • clock_recovery
  • ook_demod
  • 2fsk_demod
  • gfsk_demod
  • slicer
  • manchester_decode
  • whitening_try
  • crc_scan
  • fixed_preamble_framer
  • variable_length_framer

14.3 Build Order

  1. acquisition and replay path
  2. module registry
  3. pipeline executor
  4. scoring engine
  5. experiment persistence
  6. AI orchestration loop
  7. Wireshark-compatible export and protocol backend
  8. handheld UI and reporting

15. Key Risks

Primary technical risks:

  • search-space explosion
  • weak scoring functions
  • overfitting to noise or short windows
  • mixing structure search and parameter search too early
  • lack of reproducible experiment logs

Primary product risks:

  • placing AI too low in the stack
  • trying to make the first version too universal
  • failing to define a standard intermediate representation

Primary integration risk:

  • misunderstanding Wireshark's role and pushing it below the framing boundary

16. Summary

The proposed portable signal analyzer should be designed as a layered system:

  • deterministic algorithms do the actual signal recovery
  • Wireshark-derived tooling handles protocol analysis after framing
  • AI operates above those layers as an experiment orchestrator, tuning controller, and explanation engine

The winning architecture is not "AI decodes everything". It is "AI controls a rigorous decoding and analysis workflow".