We found 5 articles tagged with "llm" | T Cloud Public Architecture Center

Build a Unified LLM Gateway with LiteLLM on CCE

An LLM gateway acts as a centralized entry point for all interactions between applications and large language models. Rather than binding applications to specific model APIs or providers, the gateway introduces a stable interface that abstracts the underlying inference layer. This separation allows platform teams to control how requests are routed, which models are used, and where inference is executed, without requiring changes at the application level. In environments where multiple model backends coexist, such as locally hosted models on GPU infrastructure and external inference services, the gateway becomes the control plane for traffic management, policy enforcement, operational consistency and costs management.

Deploy LiteLLM on CCE

LiteLLM is a lightweight gateway that provides a unified interface for interacting with multiple large language model providers. It exposes an OpenAI-compatible API, allowing applications and tools to integrate once while abstracting the differences between various backends. In this role, LiteLLM sits between clients and the underlying inference layer and becomes the central control point for how models are consumed. It can route requests to different backends, such as local runtimes or external providers, without requiring changes on the client side. This enables flexibility in choosing where inference runs based on cost, performance, or data residency requirements.

Deploy Ollama on CCE

Ollama is a lightweight runtime for running large language models locally. It provides a simple way to download, manage, and serve models through a REST API, without requiring complex setup or deep knowledge of model serving frameworks. This makes it well suited for environments where ease of deployment and fast iteration are important.

Deploy Open WebUI on CCE

Open WebUI is a self-hosted web interface for interacting with large language models. It provides a chat-based UI that connects to OpenAI-compatible APIs, making it easy to test and use different models without building custom frontends. In this blueprint, Open WebUI acts as the user-facing layer on top of the LLM gateway. It allows users and teams to interact with the models exposed through LiteLLM, without needing to know where those models are running. This makes it a practical tool for internal adoption, enabling non-developers and developers alike to access LLM capabilities through a browser.

Deploy vLLM Production Stack on CCE

vLLM is an open-source inference and serving engine for large language models. It is designed to improve serving throughput and GPU memory efficiency, mainly through PagedAttention, continuous batching, prefix caching, and an OpenAI-compatible serving interface.