
Local CLI tool for running and managing large language models with support for models like Llama, Gemma, and Phi.
Ollama is a command-line tool for running large language models locally. It provides a simple interface to download, manage, and interact with various models including Llama 4, Gemma 3, DeepSeek-R1, and dozens of others ranging from 1B to 671B parameters. Users can run models with commands like ollama run llama3.2 and chat with them directly in the terminal.
The tool supports model customization through Modelfiles, allowing users to modify prompts, adjust parameters like temperature, and import models from GGUF or Safetensors formats. It handles multimodal inputs for vision models, accepts multiline prompts, and can generate embeddings. Ollama includes a REST API on localhost:11434 for programmatic access and integration with other applications.
Ollama serves as both a model runtime and management system, with commands to pull, remove, copy, and monitor running models. It supports Docker deployment and provides Python and JavaScript client libraries. The tool is designed for developers, researchers, and anyone who needs to run LLMs locally without relying on cloud services, with memory requirements ranging from 8GB RAM for 7B models to 32GB for 33B models.
# via Linux
curl -fsSL https://ollama.com/install.sh | sh
# via Docker
docker pull ollama/ollama