Pipeline Architecture

Overview

A local-first system for converting video content into searchable knowledge.

INPUT                    PROCESS                    OUTPUT
─────────────────────────────────────────────────────────────
YouTube URL         →    yt-dlp              →    Audio file
Podcast feed        →    download            →    (.mp3/.m4a)
Local video         →                        →
                         ↓
                    Whisper/Parakeet         →    Transcript
                    (local ML)               →    + timestamps
                         ↓
                    LLM extraction           →    Topics
                    (optional)               →    Summary
                                             →    Key points
                         ↓
                    Database                 →    Searchable
                    + embeddings             →    knowledge base

Stage 1: Capture

Tool: yt-dlp

# Download audio only (smallest file)
yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=..."

# Download entire channel
yt-dlp -x --audio-format mp3 "https://youtube.com/@ChannelName"

# With metadata
yt-dlp -x --audio-format mp3 --write-info-json "URL"

Considerations

Storage: Audio-only is ~10-20MB per hour vs 500MB+ for video
Rate limiting: YouTube may throttle; add delays for large batches
Cookies: Some content requires authentication
Alternatives: Podcast RSS feeds, local recordings

Stage 2: Transcribe

Option A: Whisper (OpenAI)

# Install
pip install openai-whisper

# Transcribe
whisper audio.mp3 --model medium --output_format json

Models by accuracy/speed:

Model	VRAM	Speed	Accuracy
tiny	1GB	32x	Good
base	1GB	16x	Better
small	2GB	6x	Good+
medium	5GB	2x	Great
large	10GB	1x	Best

Option B: Parakeet (NVIDIA)

Faster than Whisper on NVIDIA GPUs. Similar accuracy.

import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-1.1b")
transcript = model.transcribe(["audio.mp3"])

Option C: MacWhisper (macOS)

GUI app using Apple Silicon acceleration. Good for manual processing.

Output Format

{
  "text": "Full transcript here...",
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Welcome to the tutorial..."
    }
  ]
}

Stage 3: Structure (Optional)

LLM Extraction

Use local or API LLM to extract:

Topic list
Key points
Timestamps for major sections
Named entities (tools, people, concepts)

prompt = """
Extract from this transcript:
1. Main topics (list)
2. Key takeaways (3-5 bullets)
3. Tools/products mentioned
4. Timestamps for major sections

Transcript:
{transcript}
"""

Cost Consideration

Local LLM: Free, slower
API (GPT-4, Claude): ~$0.01-0.10 per transcript
Batch processing: Queue and process overnight

Stage 4: Store

Schema

CREATE TABLE videos (
  id UUID PRIMARY KEY,
  url TEXT,
  title TEXT,
  channel TEXT,
  duration INTEGER,
  watched_at TIMESTAMP,
  transcript TEXT,
  topics TEXT[],
  summary TEXT
);

CREATE TABLE segments (
  id UUID PRIMARY KEY,
  video_id UUID REFERENCES videos(id),
  start_time FLOAT,
  end_time FLOAT,
  text TEXT,
  embedding VECTOR(768)
);

Embeddings

Convert segments to vectors for semantic search:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
embedding = model.encode(segment_text)

Stage 5: Retrieve

Semantic Search

-- Find segments similar to query
SELECT video_id, text, start_time
FROM segments
ORDER BY embedding <-> query_embedding
LIMIT 10;

Full-Text Search

-- Keyword search
SELECT * FROM videos
WHERE to_tsvector(transcript) @@ to_tsquery('kubernetes & deployment');

Combined Interface

Natural language query → semantic search
Keyword query → full-text search
Filter by channel, date, topic

Performance

Metric	Value
Videos processed	9,996
Total transcripts	15,955 files
Channels tracked	91
Search latency	<500ms
API cost	$0 (local ML)

Replication

Minimum Setup

Install yt-dlp: pip install yt-dlp
Install Whisper: pip install openai-whisper
Download + transcribe in one script
Store in SQLite or JSON files

Full Setup

Supabase for database + vector search
Batch processing with queue
Web interface for search
Automatic channel monitoring

Contribute improvements to the pipeline or share your own architecture.