We found 2 articles tagged with "sfs-turbo" | T Cloud Public Architecture Center

Build a Unified LLM Gateway with LiteLLM on CCE

An LLM gateway acts as a centralized entry point for all interactions between applications and large language models. Rather than binding applications to specific model APIs or providers, the gateway introduces a stable interface that abstracts the underlying inference layer. This separation allows platform teams to control how requests are routed, which models are used, and where inference is executed, without requiring changes at the application level. In environments where multiple model backends coexist, such as locally hosted models on GPU infrastructure and external inference services, the gateway becomes the control plane for traffic management, policy enforcement, operational consistency and costs management.

Deploy Ollama on CCE

Ollama is a lightweight runtime for running large language models locally. It provides a simple way to download, manage, and serve models through a REST API, without requiring complex setup or deep knowledge of model serving frameworks. This makes it well suited for environments where ease of deployment and fast iteration are important.