The Raspberry Pi cluster is great for orchestration, but it has no GPU. Running large language models on ARM Cortex cores isn't practical. What we do have is a Windows PC with an RTX 4090 sitting on the same network, already running Ollama. The question is how to connect the cluster to it cleanly: without complicated driver passthrough, without hacks, and in a way that looks native to anything running inside Kubernetes.
This is the companion article to Episode 7 of the Kubernetes on Raspberry Pi series. We expose Ollama as a native Kubernetes service using EndpointSlices, then deploy Open WebUI to put a polished chat interface on top, accessible at https://ai.spatacoli.xyz.
All configs are in the kubernetes-series GitHub repo under video-07-gpu-inference-ollama/.
The Setup
| Component | Details |
|---|---|
| GPU machine | Windows PC, RTX 4090, IP 10.51.50.13 |
| Ollama | Running on Windows, listening on 0.0.0.0:11434 |
| Open WebUI | Deployed in the cluster, connects to Ollama |
| Access URL | https://ai.spatacoli.xyz |
Before anything else, Ollama must be listening on all interfaces, not just localhost. Set the environment variable in Ollama's settings:
OLLAMA_HOST=0.0.0.0
Also ensure Windows Firewall allows inbound traffic on port 11434. Talos doesn't allow SSH onto nodes, so we test from inside the cluster using a temporary pod:
kubectl run -it --rm curl-test \
--image=curlimages/curl \
--restart=Never \
-n default \
-- http://10.51.50.13:11434/api/tags
You should get a JSON response listing available models. If it times out, check the Windows Firewall before proceeding.
External Services and EndpointSlices
This episode introduces a pattern that makes Kubernetes much more flexible: External Services. Instead of pointing a Service at pods inside the cluster, we point it at an IP address outside it. From the perspective of any pod in the cluster, Ollama looks identical to an internal service. It's reachable by DNS name, not raw IP.
The mechanism is an EndpointSlice with a manual address. Create the namespace, Service, and EndpointSlice together:
# ollama-external.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ai
---
apiVersion: v1
kind: Service
metadata:
name: ollama
namespace: ai
spec:
ports:
- port: 11434
targetPort: 11434
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: ollama
namespace: ai
labels:
kubernetes.io/service-name: ollama
addressType: IPv4
ports:
- port: 11434
protocol: TCP
endpoints:
- addresses:
- 10.51.50.13
kubectl apply -f ollama-external.yaml
Any pod in the cluster can now reach Ollama at http://ollama.ai.svc.cluster.local:11434, as if it were running inside Kubernetes.
Deploying Open WebUI
Open WebUI is a polished chat interface that connects to Ollama (or any OpenAI-compatible API). The key environment variable OLLAMA_BASE_URL points it at our external service DNS name:
# open-webui.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: open-webui
namespace: ai
spec:
replicas: 1
selector:
matchLabels:
app: open-webui
template:
metadata:
labels:
app: open-webui
spec:
containers:
- name: open-webui
image: ghcr.io/open-webui/open-webui:latest
env:
- name: OLLAMA_BASE_URL
value: http://ollama.ai.svc.cluster.local:11434
ports:
- containerPort: 8080
volumeMounts:
- name: webui-data
mountPath: /app/backend/data
volumes:
- name: webui-data
persistentVolumeClaim:
claimName: open-webui-pvc
---
apiVersion: v1
kind: Service
metadata:
name: open-webui
namespace: ai
spec:
selector:
app: open-webui
ports:
- port: 8080
targetPort: 8080
type: ClusterIP
Open WebUI stores conversations, settings, and user accounts in /app/backend/data, so it needs its own PVC:
# open-webui-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: open-webui-pvc
namespace: ai
spec:
accessModes:
- ReadWriteOnce
storageClassName: nfs
resources:
requests:
storage: 5Gi
Adding an Ingress
The Ingress follows the same pattern as Episodes 5 and 6. We reference letsencrypt-prod in the annotation so cert-manager handles the TLS certificate automatically, no separate Certificate resource needed:
# open-webui-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: open-webui
namespace: ai
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: "true"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: traefik
tls:
- hosts:
- ai.spatacoli.xyz
secretName: open-webui-tls
rules:
- host: ai.spatacoli.xyz
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: open-webui
port:
number: 8080
kubectl apply -f open-webui-pvc.yaml
kubectl apply -f open-webui.yaml
kubectl apply -f open-webui-ingress.yaml
Open https://ai.spatacoli.xyz, create your account, and start chatting with models running on your own hardware. The cluster handles routing and TLS; the RTX 4090 handles inference.
What's Next
In Episode 8 we add centralized logging with Loki and Promtail. When something breaks, you need to know where to look.