Podcast Transcription Pipeline

This notebooks provides a use case for fetching a podcast episode from a public RSS feed, transcribing it locally with faster-whisper on CUDA, then formating the raw long format transcript into a structured Markdown file using a local LLM via Ollama. In this setup local LLM process the transcribe in batches and re-construct the final format to fit into low vram requirements in personal computing environemnets.

This notebook assumes the following requirements. See the instructions to setup ollama locally here.

Requirements

CUDA-capable GPU with PyTorch CUDA support
Ollama running locally (ollama serve)
A model pulled in Ollama, e.g. ollama pull qwen2.5:7b

In [1]:

# pip install nvidia-cublas-cu12

Imports

In [2]:

from pathlib import Path
import json
import re
import time
import urllib.request

import feedparser # pip install feedparser
import requests
from faster_whisper import WhisperModel, BatchedInferencePipeline # pip install faster-whisper

Configuration

Set the RSS feed URL and adjust model/LLM settings here. All other paths are derived automatically.

In [3]:

FEED_URL    = "https://feeds.captivate.fm/gradient-dissent/"
EPISODE_INDEX = 0   # 0 = latest, 1 = second latest, etc.

OUTPUT_DIR     = Path.cwd() / "transcribe_outputs"
AUDIO_DIR      = OUTPUT_DIR / "audio"
TRANSCRIPTS_DIR = OUTPUT_DIR / "transcripts"
RESULTS_DIR    = OUTPUT_DIR / "results"

for d in [AUDIO_DIR, TRANSCRIPTS_DIR, RESULTS_DIR]:
    d.mkdir(parents=True, exist_ok=True)

# Whisper (faster-whisper, CUDA)
# Requires a CUDA-capable GPU with PyTorch CUDA support.
# Model sizes: tiny | base | small | medium | large-v2 | large-v3 | distil-large-v3
WHISPER_MODEL  = "distil-large-v3"
DEVICE         = "cuda"      # change to "cpu" if no GPU available (slower)
COMPUTE_TYPE   = "float16"   # float16 requires CUDA; use "int8" for CPU
BATCH_SIZE     = 8
BEAM_SIZE      = 5
LANGUAGE       = "en"
WORD_TIMESTAMPS = False

# Requires Ollama running: 'ollama serve'
# Pull model first:        'ollama pull qwen2.5:7b'
OLLAMA_URL            = "http://localhost:11434/api/generate"
OLLAMA_MODEL          = "qwen2.5:7b"
CHUNK_SIZE            = 3500   # characters per LLM call
SLEEP_BETWEEN_CALLS   = 0.5    # seconds between Ollama requests

# print(f"Output root : {OUTPUT_DIR}")
# print(f"Audio       : {AUDIO_DIR}")
# print(f"Transcripts : {TRANSCRIPTS_DIR}")
# print(f"Results     : {RESULTS_DIR}")

Helper functions

In [4]:

def safe_filename(value):
    value = re.sub(r'[\\/*?:"<>|]', "", value)
    value = re.sub(r"\s+", " ", value).strip()
    return value[:180]

def get_audio_url(entry):
    for link in entry.get("links", []):
        if link.get("rel") == "enclosure":
            return link.get("href", "")
    return ""

def seconds_to_srt_time(seconds):
    total_ms = int(round(seconds * 1000))
    hours, total_ms = divmod(total_ms, 3600000)
    minutes, total_ms = divmod(total_ms, 60000)
    secs, millis = divmod(total_ms, 1000)
    return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"

def write_txt(path, segments):
    text = " ".join(s["text"].strip() for s in segments).strip()
    path.write_text(text + "\n", encoding="utf-8")

def write_srt(path, segments):
    lines = []
    for i, s in enumerate(segments, start=1):
        lines += [str(i), f"{seconds_to_srt_time(s['start'])} --> {seconds_to_srt_time(s['end'])}", s["text"].strip(), ""]
    path.write_text("\n".join(lines), encoding="utf-8")

def write_json(path, payload):
    path.write_text(json.dumps(payload, ensure_ascii=False, indent=2), encoding="utf-8")

def split_text(text, chunk_size):
    text = re.sub(r"\s+", " ", text).strip()
    chunks, start = [], 0
    while start < len(text):
        end = min(start + chunk_size, len(text))
        if end < len(text):
            for sep in [". ", "? ", "! ", " "]:
                sp = text.rfind(sep, start, end)
                if sp != -1 and sp > start:
                    end = sp + 1
                    break
        chunk = text[start:end].strip()
        if chunk:
            chunks.append(chunk)
        start = end
    return chunks

def call_ollama(prompt):
    payload = {
        "model": OLLAMA_MODEL,
        "prompt": prompt,
        "stream": False,
        "options": {"temperature": 0}
    }
    response = requests.post(OLLAMA_URL, json=payload, timeout=600)
    response.raise_for_status()
    return response.json()["response"].strip()

def build_prompt(chunk, chunk_index, total_chunks):
    return f"""You are cleaning up one chunk of a raw podcast transcript.

Task:
- Convert this transcript chunk into clean markdown
- Restore paragraph breaks
- Preserve meaning exactly
- Do not summarize or omit content
- Do not invent new content
- Keep output proportional to input
- Add a short section heading only if clearly justified by the content
- Return only the cleaned markdown for this chunk

This is chunk {chunk_index} of {total_chunks}.

Transcript chunk:
{chunk}""".strip()

Parse RSS feed and resolve episode

In [5]:

feed = feedparser.parse(FEED_URL)

if feed.bozo:
    raise RuntimeError(f"Feed parse error: {feed.bozo_exception}")

latest = feed.entries[EPISODE_INDEX]

episode_title     = latest.get("title", "Untitled")
episode_published = latest.get("published", "Unknown date")
episode_audio_url = get_audio_url(latest)
episode_slug      = safe_filename(episode_title)

if not episode_audio_url:
    raise RuntimeError("No audio enclosure URL found in this feed entry.")

# Derive all output paths from the single OUTPUT_DIR root
audio_path        = AUDIO_DIR / f"{episode_slug}.mp3"
episode_tx_dir    = TRANSCRIPTS_DIR / episode_slug
episode_tx_dir.mkdir(parents=True, exist_ok=True)
txt_path          = episode_tx_dir / f"{episode_slug}.txt"
srt_path          = episode_tx_dir / f"{episode_slug}.srt"
json_path         = episode_tx_dir / f"{episode_slug}.json"
episode_res_dir   = RESULTS_DIR / episode_slug
episode_res_dir.mkdir(parents=True, exist_ok=True)
final_md_path     = episode_res_dir / f"{episode_slug}.md"

print(f"Episode     : {episode_title}")
print(f"Published   : {episode_published}")
print(f"Audio URL   : {episode_audio_url}")
# print(f"Audio path  : {audio_path}")
# print(f"Transcript  : {episode_tx_dir}")
# print(f"Result      : {final_md_path}")

Episode     : Uber, Nissan, and Mercedes Chose This Self-Driving Startup | Alex Kendall, Wayve
Published   : Wed, 15 Apr 2026 06:52:00 -0400
Audio URL   : https://episodes.captivate.fm/episode/4eb5f283-6abf-49f3-abf2-39e5ef84e293.mp3

Download audio

In [6]:

print(f"Downloading from: {episode_audio_url}")
urllib.request.urlretrieve(episode_audio_url, audio_path)
size_mb = audio_path.stat().st_size / 1_048_576
# print(f"Downloaded: {audio_path} ({size_mb:.1f} MB)")

Downloading from: https://episodes.captivate.fm/episode/4eb5f283-6abf-49f3-abf2-39e5ef84e293.mp3

Load Whisper model

Uses faster-whisper with CUDA and float16. The first run downloads the model weights.

In [7]:

model = WhisperModel(
    WHISPER_MODEL,
    device=DEVICE,
    compute_type=COMPUTE_TYPE
)

pipeline = BatchedInferencePipeline(model=model)

print(f"Whisper model loaded: {WHISPER_MODEL} on {DEVICE} ({COMPUTE_TYPE})")

Whisper model loaded: distil-large-v3 on cuda (float16)

Transcribe

Writes .txt, .srt, and .json to transcribe_outputs/transcripts/<episode-slug>/.

In [8]:

segments, info = pipeline.transcribe(
    str(audio_path),
    batch_size=BATCH_SIZE,
    language=LANGUAGE,
    beam_size=BEAM_SIZE,
    word_timestamps=WORD_TIMESTAMPS,
    condition_on_previous_text=False,
    vad_filter=True
)

collected = []
for segment in segments:
    item = {
        "id":    segment.id,
        "start": float(segment.start),
        "end":   float(segment.end),
        "text":  segment.text.strip()
    }
    if WORD_TIMESTAMPS and getattr(segment, "words", None):
        item["words"] = [
            {
                "start":       None if w.start is None else float(w.start),
                "end":         None if w.end is None else float(w.end),
                "word":        w.word,
                "probability": None if w.probability is None else float(w.probability)
            }
            for w in segment.words
        ]
    collected.append(item)

payload = {
    "source_file":          audio_path.name,
    "episode_title":        episode_title,
    "episode_published":    episode_published,
    "audio_url":            episode_audio_url,
    "model":                WHISPER_MODEL,
    "device":               DEVICE,
    "compute_type":         COMPUTE_TYPE,
    "language":             info.language,
    "language_probability": float(info.language_probability),
    "duration":             None if info.duration is None else float(info.duration),
    "duration_after_vad":   None if info.duration_after_vad is None else float(info.duration_after_vad),
    "segments":             collected
}

write_txt(txt_path, collected)
write_srt(srt_path, collected)
write_json(json_path, payload)

duration_min = (payload["duration"] or 0) / 60
print(f"Segments    : {len(collected)}")
print(f"Duration    : {duration_min:.1f} min")
print(f"Language    : {info.language} ({info.language_probability:.2%})")
# print(f"TXT         : {txt_path}")
# print(f"SRT         : {srt_path}")
# print(f"JSON        : {json_path}")

Segments    : 98
Duration    : 45.8 min
Language    : en (100.00%)

Preview raw transcript

In [9]:

full_text = " ".join(s["text"] for s in collected).strip()

print(f"Total characters : {len(full_text):,}")
print(f"Total segments   : {len(collected)}")
print()
print("─" * 60)
print(full_text[:800])
print("─" * 60)

Total characters : 48,149
Total segments   : 98

────────────────────────────────────────────────────────────
Autonomous driving is all about looking at the AV problem with an AI approach. There is an opportunity there to drive accidents down to near zero. So what we did is we raised one and a half million dollars, got some friends together, we rented a house and we put our car in the garage and started hacking away. Taking it from expensive retrofit vehicles, which relied on compute, HD maps, infrastructure, to mass market vehicles that you can buy or manufacture for $30,000, $50,000 each. It had inbuilt hardware that's in global supply chains. Doesn't need an HDMAP so it can drive anywhere. We're now the first company to have driven zero. shot in over 500 cities. Throughout Europe, Asia and North America, it's also driven in over 10 different cars, from electric vehicles, the vans, the SUV. We w
────────────────────────────────────────────────────────────

Format with local LLM (Ollama)

The raw transcript is split into chunks and each is cleaned and formatted by the local LLM. Output is written to transcribe_outputs/results/<episode-slug>/.

Requires Ollama running locally. Start it with:

ollama serve
ollama pull qwen2.5:7b

In [10]:

chunks = split_text(full_text, CHUNK_SIZE)

print(f"Total chunks     : {len(chunks)}")
print(f"Avg chunk length : {sum(len(c) for c in chunks) // len(chunks):,} chars")
# print(f"Output dir       : {episode_res_dir}")
# print(f"Final MD         : {final_md_path}")

Total chunks     : 15
Avg chunk length : 3,209 chars

In [11]:

formatted_chunks = []

for i, chunk in enumerate(chunks, start=1):
    print(f"Processing chunk {i}/{len(chunks)} ({len(chunk):,} chars)")

    prompt = build_prompt(chunk, i, len(chunks))
    result = call_ollama(prompt)

    chunk_file = episode_res_dir / f"{episode_slug}_chunk_{i:02d}.md"
    chunk_file.write_text(result, encoding="utf-8")

    formatted_chunks.append(f"<!-- chunk {i} -->\n\n{result}")
    time.sleep(SLEEP_BETWEEN_CALLS)

# Assemble final markdown with episode header
header = f"""# {episode_title}

**Published:** {episode_published}  
**Model:** {WHISPER_MODEL} · {OLLAMA_MODEL}  
**Duration:** {duration_min:.1f} min  

---

"""

final_text = header + "\n\n".join(formatted_chunks).strip() + "\n"
final_md_path.write_text(final_text, encoding="utf-8")

# print()
# print(f"Done. Final output: {final_md_path}")

Processing chunk 1/15 (3,484 chars)
Processing chunk 2/15 (3,405 chars)
Processing chunk 3/15 (3,360 chars)
Processing chunk 4/15 (3,344 chars)
Processing chunk 5/15 (3,205 chars)
Processing chunk 6/15 (3,421 chars)
Processing chunk 7/15 (3,311 chars)
Processing chunk 8/15 (3,419 chars)
Processing chunk 9/15 (3,375 chars)
Processing chunk 10/15 (3,448 chars)
Processing chunk 11/15 (3,498 chars)
Processing chunk 12/15 (3,366 chars)
Processing chunk 13/15 (3,489 chars)
Processing chunk 14/15 (3,395 chars)
Processing chunk 15/15 (615 chars)

Out[11]:

Preview formatted output

In [14]:

preview = final_md_path.read_text(encoding="utf-8")

# print(f"Output file : {final_md_path}")
print(f"File size   : {final_md_path.stat().st_size / 1024:.1f} KB")
print()
print("─" * 60)
print(preview[:1200])
print("─" * 60)
print("...")
print(preview[-600:])

File size : 45.3 KB

────────────────────────────────────────────────────────────
# Uber, Nissan, and Mercedes Chose This Self-Driving Startup | Alex Kendall, Wayve

**Published:** Wed, 15 Apr 2026 06:52:00 -0400
**Model:** distil-large-v3 · qwen2.5:7b
**Duration:** 45.8 min

---

# The Story of Wave: Autonomous Driving with AI

Autonomous driving is all about looking at the AV problem with an AI approach. There is an opportunity there to drive accidents down to near zero. So what we did is we raised one and a half million dollars, got some friends together, we rented a house and we put our car in the garage and started hacking away. Taking it from expensive retrofit vehicles, which relied on compute, HD maps, infrastructure, to mass market vehicles that you can buy or manufacture for $30,000, $50,000 each. It had inbuilt hardware that's in global supply chains. Doesn't need an HDMAP so it can drive anywhere.

We're now the first company to have driven zero-shot in over 500 cities. Throughout Europe, Asia and North America, it's also driven in over 10 different cars, from electric vehicles, vans, the SUVs. We went north of the Arctic Circle, tested driving in 22-hour darkness and snow. We were even in Tokyo during a typhoon, where local t
────────────────────────────────────────────────────────────
...
at scale really drives through where we are today, and I think it does require different skills to operate at the silver scale. But the impact you can generate is just tremendous. However, holding on to disruptive innovation and not falling for the innovator's dilemma—these kinds of principles are still really things we fight for and I think they still hold true today at Wave.

Well, that's another success. It's been a joy watching you guys grow. Thanks, Lucas, who said it great yesterday.

Thanks so much for listening to this episode of Grading Descent. Please stay tuned for future episodes.