In this setup Ollama runs as a local HTTP API on http://localhost:11434. On Linux it integrates with systemd for automatic startup. Local access via API requires no authentication and model runs on GPU.
Quick reference
# Install
curl -fsSL https://ollama.com/install.sh | sh
# Service
systemctl status ollama
sudo systemctl start ollama
sudo systemctl enable ollama
# Models
ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull llama3.1:8b-instruct-q4_K_M
ollama list
# API checks
curl http://localhost:11434/api/tags
curl http://localhost:11434/api/ps
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5:7b-instruct-q4_K_M",
"prompt": "Say hello world",
"stream": false
}'Install
curl -fsSL https://ollama.com/install.sh | shService management
On Linux, Ollama runs as a systemd service. The API starts automatically on boot without needing to run ollama serve manually.
systemctl status ollama # check current status
sudo systemctl start ollama # start now
sudo systemctl enable ollama # enable on boot
journalctl -u ollama --no-pager -n 50 # view logsIf ollama serve says the port is already in use, the systemd service is already running. Use the running service rather than starting a second server manually.
Pull models
ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull llama3.1:8b-instruct-q4_K_M
ollama listRun a model interactively from the terminal:
ollama run qwen2.5:7b-instruct-q4_K_M
ollama run llama3.1:8b-instruct-q4_K_MCheck the API from curl
Once Ollama is running the API is available immediately. Key endpoints: /api/generate for generation, /api/tags for installed models, /api/ps for currently running models.
curl http://localhost:11434/api/tags # installed models
curl http://localhost:11434/api/ps # running modelscurl http://localhost:11434/api/generate -d '{
"model": "qwen2.5:7b-instruct-q4_K_M",
"prompt": "Say hello world",
"stream": false
}'curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b-instruct-q4_K_M",
"prompt": "Say hello world",
"stream": false
}'Pretty-print the JSON response:
curl -s http://localhost:11434/api/generate -d '{
"model": "qwen2.5:7b-instruct-q4_K_M",
"prompt": "Say hello world",
"stream": false
}' | python -m json.toolCheck from Python
To call the models, no special Ollama Python package is needed. A plain HTTP request is enough since Ollama exposes a local HTTP API. Install requests in your venv:
pip install requestsimport requests
url = "http://localhost:11434/api/generate"
payload = {
"model": "qwen2.5:7b-instruct-q4_K_M",
"prompt": "Say hello world",
"stream": False
}
response = requests.post(url, json=payload, timeout=60)
response.raise_for_status()
print(response.json()["response"])Switch to Llama by changing only the model name:
payload = {
"model": "llama3.1:8b-instruct-q4_K_M",
"prompt": "Say hello world",
"stream": False
}Daily checklist
systemctl status ollama
ollama list
curl http://localhost:11434/api/tags
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5:7b-instruct-q4_K_M",
"prompt": "Say hello world",
"stream": false
}'