We found 3 articles tagged with "ollama" | T Cloud Public Architecture Center

Build a Unified LLM Gateway with LiteLLM on CCE

An LLM gateway acts as a centralized entry point for all interactions between applications and large language models. Rather than binding applications to specific model APIs or providers, the gateway introduces a stable interface that abstracts the underlying inference layer. This separation allows platform teams to control how requests are routed, which models are used, and where inference is executed, without requiring changes at the application level. In environments where multiple model backends coexist, such as locally hosted models on GPU infrastructure and external inference services, the gateway becomes the control plane for traffic management, policy enforcement, operational consistency and costs management.

Deploy Ollama on CCE

Ollama is a lightweight runtime for running large language models locally. It provides a simple way to download, manage, and serve models through a REST API, without requiring complex setup or deep knowledge of model serving frameworks. This makes it well suited for environments where ease of deployment and fast iteration are important.

Securely Expose Remote Ollama Endpoints to your Development Machine

Exposing Ollama endpoints directly from your cloud environment to your local development machine can be highly beneficial, especially when it comes to optimizing the use of expensive resources like GPUs and integrating them with local cost-effective development hardware.