We found 3 articles tagged with "litellm" | T Cloud Public Architecture Center

Build a Unified LLM Gateway with LiteLLM on CCE

An LLM gateway acts as a centralized entry point for all interactions between applications and large language models. Rather than binding applications to specific model APIs or providers, the gateway introduces a stable interface that abstracts the underlying inference layer. This separation allows platform teams to control how requests are routed, which models are used, and where inference is executed, without requiring changes at the application level. In environments where multiple model backends coexist, such as locally hosted models on GPU infrastructure and external inference services, the gateway becomes the control plane for traffic management, policy enforcement, operational consistency and costs management.

Deploy LiteLLM on CCE

LiteLLM is a lightweight gateway that provides a unified interface for interacting with multiple large language model providers. It exposes an OpenAI-compatible API, allowing applications and tools to integrate once while abstracting the differences between various backends. In this role, LiteLLM sits between clients and the underlying inference layer and becomes the central control point for how models are consumed. It can route requests to different backends, such as local runtimes or external providers, without requiring changes on the client side. This enables flexibility in choosing where inference runs based on cost, performance, or data residency requirements.

Deploy vLLM Production Stack on CCE

vLLM is an open-source inference and serving engine for large language models. It is designed to improve serving throughput and GPU memory efficiency, mainly through PagedAttention, continuous batching, prefix caching, and an OpenAI-compatible serving interface.