Local Ollama API on Linux

In this setup Ollama runs as a local HTTP API on http://localhost:11434. On Linux it integrates with systemd for automatic startup. Local access via API requires no authentication and model runs on GPU.

Quick reference

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Service
systemctl status ollama
sudo systemctl start ollama
sudo systemctl enable ollama

# Models
ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull llama3.1:8b-instruct-q4_K_M
ollama list

# API checks
curl http://localhost:11434/api/tags
curl http://localhost:11434/api/ps
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:7b-instruct-q4_K_M",
  "prompt": "Say hello world",
  "stream": false
}'

Install

curl -fsSL https://ollama.com/install.sh | sh

Service management

On Linux, Ollama runs as a systemd service. The API starts automatically on boot without needing to run ollama serve manually.

systemctl status ollama          # check current status
sudo systemctl start ollama      # start now
sudo systemctl enable ollama     # enable on boot
journalctl -u ollama --no-pager -n 50  # view logs

If ollama serve says the port is already in use, the systemd service is already running. Use the running service rather than starting a second server manually.

Pull models

ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull llama3.1:8b-instruct-q4_K_M
ollama list

Run a model interactively from the terminal:

ollama run qwen2.5:7b-instruct-q4_K_M
ollama run llama3.1:8b-instruct-q4_K_M

Check the API from curl

Once Ollama is running the API is available immediately. Key endpoints: /api/generate for generation, /api/tags for installed models, /api/ps for currently running models.

curl http://localhost:11434/api/tags    # installed models
curl http://localhost:11434/api/ps      # running models

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:7b-instruct-q4_K_M",
  "prompt": "Say hello world",
  "stream": false
}'

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b-instruct-q4_K_M",
  "prompt": "Say hello world",
  "stream": false
}'

Pretty-print the JSON response:

curl -s http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:7b-instruct-q4_K_M",
  "prompt": "Say hello world",
  "stream": false
}' | python -m json.tool

Check from Python

To call the models, no special Ollama Python package is needed. A plain HTTP request is enough since Ollama exposes a local HTTP API. Install requests in your venv:

pip install requests

import requests

url = "http://localhost:11434/api/generate"
payload = {
    "model": "qwen2.5:7b-instruct-q4_K_M",
    "prompt": "Say hello world",
    "stream": False
}

response = requests.post(url, json=payload, timeout=60)
response.raise_for_status()

print(response.json()["response"])

Switch to Llama by changing only the model name:

payload = {
    "model": "llama3.1:8b-instruct-q4_K_M",
    "prompt": "Say hello world",
    "stream": False
}

Daily checklist

systemctl status ollama
ollama list
curl http://localhost:11434/api/tags
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5:7b-instruct-q4_K_M",
  "prompt": "Say hello world",
  "stream": false
}'