Multimodal AI Systems Engineering: Building Production Vision-Language Models, Document AI, and Cross-Modal Retrieval Pipelines (Production AI Engineering Series) - Softcover

Buch 9 von 11: Production AI Engineering Series

Team, ChatVariety

 
9798180602398: Multimodal AI Systems Engineering: Building Production Vision-Language Models, Document AI, and Cross-Modal Retrieval Pipelines (Production AI Engineering Series)

Inhaltsangabe

Master the Production Lifecycle of Vision-Language Models

The gap between a simple VLM demo and a highly reliable, cost-effective production system is enormous. Multimodal AI Systems Engineering bridges this gap, providing ML engineers, AI platform architects, and computer vision specialists with the definitive blueprint for deploying multimodal AI at enterprise scale.

This comprehensive, hands-on guide skips the high-level hype and dives straight into the concrete architectures, optimization pipelines, and serving infrastructure required to run models like LLaVA, SigLIP, and Qwen-VL in production environments.

What you will master inside this book:
  • Core Architectures: Deep dive into CLIP, ViT, SigLIP, and modern vision-language models (VLMs).
  • Multimodal RAG Pipelines: Design cross-modal embedding spaces, joint vector stores, and advanced retrieval pipelines.
  • Inference Optimization: Implement quantization, ONNX, TensorRT, and continuous batching to slash latency and costs.
  • Document AI & Vision: Build robust extraction pipelines for OCR, layout detection, form processing, and temporal video modeling.
  • Fine-Tuning & Serving: Scale training with LoRA, QLoRA, and DPO, and serve models with NVIDIA Triton Server.
  • Enterprise Evaluation: Rigorously evaluate and monitor VLMs using standardized benchmarks and automated CI/CD evaluation loops.

Whether you are building next-generation Document AI pipelines, complex cross-modal search engines, or deploying fine-tuned VLMs onto edge devices, this book delivers the battle-tested engineering patterns you need to succeed in the real world.

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.