Learning Pipeline

Passive consumption → Searchable knowledge

32K videos tracked. 15K transcribed. 4,142 hours watched.

Watch and forget → transcribe, embed, retrieve.

"My best teachers have often been YouTube tutorials, reviews and demos. I found myself spending more than five hours a day watching videos at 2-3x speed."

External memory for minds that can't hold it all.

The Real Numbers

31,832

Videos tracked

15,456

Transcribed

6,152

Channels

4,142

Hours watched

The Problem

Video is inefficient. 1 hour video = 10 minutes reading. But videos have content text doesn't.
Consumed ≠ retained. Watching at 2x speed helps throughput but not recall.
No cross-reference. What did that tutorial say about X? Lost in watch history.
API costs. Cloud transcription is expensive at scale (10K+ videos).

The Solution

Local-first learning infrastructure. Capture → Transcribe → Structure → Store → Retrieve. Zero API costs. Searchable knowledge base from video consumption.

CAPTUREDownload video/audio from any sourceyt-dlp, browser extensions

TRANSCRIBEConvert audio to searchable textWhisper, Parakeet (local ML)

STRUCTUREExtract topics, timestamps, key pointsLLM processing

STORESearchable database with embeddingsSupabase, vector DB

RETRIEVEQuery across all consumed contentSemantic search

Research Questions

Retention: Does searchable transcription improve recall vs. passive watching?
Speed vs. depth: What's the optimal consumption speed for different content types?
Active retrieval: How often do people actually search their knowledge base?
Compression: Can AI summarization replace full consumption for some content?

Preliminary Data

Scale Achieved

31,832 videos tracked. 15,456 transcribed. 6,152 channels. 1,407 rewatched. Local ML transcription via Whisper/Parakeet—zero API costs.

Time Savings

Estimated 4+ hours/day saved by searching transcripts instead of re-watching. "What did that tutorial say about X?" answered in seconds instead of scrubbing.

Consumption Patterns

Primary content: tutorials, tech reviews, lectures. Average watch speed: 2-3x. Peak consumption: late evening.

Pipeline Architecture

YouTube/Podcast → yt-dlp → Audio file
                              ↓
                    Whisper/Parakeet (local)
                              ↓
                    Transcript + timestamps
                              ↓
                    LLM extraction (topics, summary)
                              ↓
                    Supabase + embeddings
                              ↓
                    Semantic search interface

Roadmap

Build transcription pipeline
Process 10K+ videos
Semantic search interface
Retention study (before/after)
Speed optimization research
Open source pipeline

Documentation

Contribute

Share your own learning infrastructure, consumption data, or retention studies.

Open an issue →

Built with yt-dlp, Whisper, and local ML.