Bisound.com - Музыкальный портал - Building A Rag Application In Python

Building A Rag Application In Python
Published 6/2026
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz, 2 Ch
Language: English | Duration: 9h 50m | Size: 7 GB

Build a streaming web chat with hybrid retrieval, multi-turn memory, and image support - from scratch
What you'll learn
Build a complete Retrieval-Augmented Generation pipeline in Python, from document ingestion to streaming chat output
Run Postgres with the pgvector extension via Docker Compose, including HNSW indexing for fast approximate-nearest-neighbour vector search
Chunk documents with paragraph-aware splitting and overlap, and explain why each chunking choice affects retrieval quality
Implement idempotent, atomic document ingestion using SHA-256 content hashes and transactional upserts
Use the OpenAI SDK to call local Ollama models and OpenAI's hosted API through the same code path
Implement hybrid retrieval that combines dense vector search with Postgres full-text BM25, fused with Reciprocal Rank Fusion
Build a query rewriter that turns follow-up questions like "what does it eat?" into standalone search queries that actually retrieve useful chunks
Build a directory watcher with watchdog, including per-path debouncing so editor saves never trigger reads of half-written files
Apply the Strategy/Adapter pattern to swap a Postgres backend for Weaviate via a single environment variable, with zero changes to the rest of the code
Build a streaming chat web UI with FastAPI, Server-Sent Events, and vanilla JavaScript - no React, no build step
Ingest images using a "describe-then-embed" vision-model pipeline, including format normalization for vision backends
Render LLM markdown output safely in the browser with marked + DOMPurify, including inline images
Apply standard software-engineering patterns - Dependency Injection, Factory, Strategy/Adapter, context managers, lazy imports, etc.
Diagnose RAG failures empirically (cosine scores, full-text ranks, fused output) instead of guessing at prompts
Requirements
Basic Python skills, basic SQL, comfort with the command line and Docker. No prior LLM or vector-database experience needed.
Description
Build a workingRetrieval-Augmented Generation (RAG) application in Python - from an empty directory to a streaming web chat with multi-turn memory, hybrid retrieval, image ingestion, and two interchangeable vector-store backends. No LangChain, no LlamaIndex, no magic. You write every line yourself, and by the end you understand exactly what each one does.
Most RAG tutorials wrap everything in a single high-level library and stop at "it works." This course goes the other way. You'll build the pipeline from scratch - chunking, embeddings, idempotent ingestion, hybrid semantic-plus-lexical retrieval with Reciprocal Rank Fusion, a query rewriter for follow-up questions, server-sent token streaming, a vision-model branch for images - on top of plain Postgres (with pgvector) and a local Ollama server.No API bills while you learn. No black boxes. When you later reach for a framework like LangChain, you'll actually understand what it's doing under the hood.
What you'll build, in one project
- Runs entirely locally against Ollama, or transparently against the OpenAI API by changing one environment variable
- Stores embeddings in Postgres + pgvector with HNSW indexing, or in Weaviate - backends swappable via a single config setting
- Hybrid retrieval: dense vector search and Postgres full-text BM25, fused with Reciprocal Rank Fusion - fixing the cases where pure semantic search silently fails on rare terms, names, and identifiers
- A directory watcher that ingests new files automatically, with editor-save debouncing so it never reads a half-written file
- A streaming web chat UI built on FastAPI + Server-Sent Events + vanilla JavaScript - no React, no build step - with multi-turn memory, query rewriting for follow-ups, source citations, and inline image rendering
- Image ingestion through a vision model with a "describe-then-embed" pipeline - multimodal in the same chunks table, no schema change required
Along the way you'll work through real software-design patterns in real code: Dependency Injection, Strategy/Adapter, Factory, lifespans, context managers, thread-safety boundaries, atomic transactions, defensive coding against external services that quietly don't work the way their docs claim. The course's recurring theme is the payoff of good abstractions: the vector-store interface designed early lets you bolt on a second backend in one file; the same retrieval pipeline serves both the CLI and the web app; the chunk-metadata field that seemed academic early in the course is what makes image support a simple change later on.
You'll finish with a codebase you can extend - add a reranker, try a different embedder, swap the chat model, point it at a corpus of your own docs - and the engineering vocabulary to talk about RAG as production software, not a notebook demo.
Who this course is for
Python developers interested in integrating LLMs into their projects, and adding RAG functionality.

Цитата: