![]() |
Fine-Tune & Deploy Llms With Qlora On Sagemaker + Streamlit
![]() Fine-Tune & Deploy Llms With Qlora On Sagemaker + Streamlit Published 7/2025 MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz Language: English | Size: 3.56 GB | Duration: 7h 12m Master QLoRA Math, Mixed Precision Training, Double Quantization, Lambda functions, API Gateway & Streamlit deployment What you'll learn Train/Fine Tune LLMs in AWS Sagemaker using QLoRA and advanced 4-bit quantization on your own dataset Create an interactive Streamlit app to deploy your fine tuned LLM with Sagemaker, Lambda Functions, and API Gateway Master QLoRA fine-tuning - including adapter injection, memory optimization, parameter freezing, and the mathematics behind it Leverage bfloat16 compute types for faster and more efficient training on modern GPUs Understand mixed precision training with qLoRA in Sagemaker Use Parameter Efficient Fine Tuning(PEFT) to dynamically find and inject LoRA layers Understand the entire low-level fine-tuning pipeline - from raw dataset to trained model Use double quantization and nf4 precision to compress models without losing performance Discover how gradient checkpointing drastically reduces VRAM usage during training Fine-tune large models like Mixtral on Amazon SageMaker using state-of-the-art GPU acceleration Understand custom chunking code for LLMs Merge LoRA weights and unload adapters for final model export - ready for deployment Deploy your trained model to SageMaker Endpoints using Amazon's production infrastructure Build real-time LLM APIs using Lambda functions and API Gateway Securely Set up Training Jobs with IAM roles AWS Budgeting, Server Management, and Pricing Learn how to use AWS Quotas to use powerful GPUs Requirements Familiarity with Python Basic Linear Algebra(matrix multiplication) Description Large Language Models (LLMs) are redefining what's possible with AI - from chatbots to code generation - but the barrier to training and deploying them is still high. Expensive hardware, massive memory requirements, and complex toolchains often block individual practitioners and small teams. This course is built to change that.In this hands-on, code-first training, you'll learn how to fine-tune models like Mixtral-8x7B using QLoRA - a state-of-the-art method that enables efficient training by combining 4-bit quantization, LoRA adapters, and double quantization. You'll also gain a deep understanding of quantized arithmetic, floating-point formats (like bfloat16 and INT8), and how they impact model size, memory bandwidth, and matrix multiplication operations.You'll write advanced Python code to preprocess datasets with custom token-aware chunking strategies, dynamically identify quantizable layers, and inject adapter modules using the PEFT (Parameter-Efficient Fine-Tuning) library. You'll configure and launch distributed fine-tuning jobs on AWS SageMaker, leveraging powerful multi-GPU instances and optimizing them using gradient checkpointing, mixed-precision training, and bitsandbytes quantization.After training, you'll go all the way to deployment: merging adapter weights, saving your model for inference, and deploying it via SageMaker Endpoints. You'll then expose your model through an AWS Lambda function and an API Gateway, and finally, build a Streamlit application to create a clean, responsive frontend interface.Whether you're a machine learning engineer, backend developer, or AI practitioner aiming to level up - this course will teach you how to move from academic toy models to real-world, scalable, production-ready LLMs using tools that today's top companies rely on. Machine Learning Engineers,Backend and MLOps Engineers,AI Researchers and Students,Anyone who wants to go beyond "prompt engineering" and start building, training, and deploying their own production-ready LLMs. Цитата:
|
Часовой пояс GMT +3, время: 07:51. |
vBulletin® Version 3.6.8.
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Перевод: zCarot