technical tutorial ollama private-ai small-business

How to Set Up Ollama on a Mac Mini for Your Small Office in 2026

Sasa Abe | | 13 min read

A Mac Mini is one of the most underrated pieces of small-business AI infrastructure available in 2026. With Ollama running on a Mac Mini M4 Pro, a small office of 5-20 users gets a capable private AI system — running locally, no data leaving the building, no cloud subscriptions — for under $5,000 of hardware. This guide walks through exactly how to set it up, what to expect, and where the trade-offs are.

Why Mac Mini Is the Sweet Spot for Small Offices

The Mac Mini has emerged as a quiet star in private AI deployment for small offices. The reasons:

Factor Mac Mini Equivalent PC Server
Price (24GB / 48GB) $2,799 - $4,499 AUD $5,000 - $8,000 AUD
Size Fits on a shelf (12.7 x 12.7 x 5 cm) Half-rack to full tower
Noise Effectively silent Fans audible
Power draw 5-65W typical 200-400W typical
Heat output Minimal Significant
Unified memory Yes (CPU/GPU share) No (separate VRAM)
Suitability for LLMs Excellent Requires dedicated GPU
Office aesthetics Inconspicuous Server-grade

The unified memory architecture in Apple Silicon (M-series chips) is particularly well-suited to LLM workloads. The CPU and GPU share the same physical memory pool, which means a model can use the full memory allocation without the data-shuffling overhead of separate GPU memory.

For a Melbourne law firm with 12 lawyers, a Cremorne accounting practice with 25 staff, or a Richmond medical clinic with 8 practitioners — the Mac Mini sits behind the front desk or in the IT cupboard, runs silently, and serves the team's AI needs without anyone noticing it's there.

Recommended Hardware Configuration

For 5-10 Users: Mac Mini M4 Pro 24GB

  • Mac Mini M4 Pro, 12-core CPU, 16-core GPU, 24GB unified memory
  • 512GB SSD (sufficient for models + small document corpus)
  • AUD $2,799

Comfortable for 5-10 concurrent users running Gemma 4 9B or Llama 3 8B. Good for small office basic AI use cases.

For 10-20 Users: Mac Mini M4 Pro 48GB (Recommended)

  • Mac Mini M4 Pro, 14-core CPU, 20-core GPU, 48GB unified memory
  • 1TB SSD (room for larger models + document index)
  • AUD $4,499

The sweet spot for most small office deployments. Runs Gemma 4 27B or Llama 3 13B comfortably. Handles concurrent users with low latency.

For 20+ Users: Mac Studio M4 Max 64GB

  • Mac Studio M4 Max, 14-core CPU, 32-core GPU, 64GB unified memory
  • 1TB SSD
  • AUD $7,599

When you outgrow Mac Mini territory. Runs larger models (Llama 3 70B quantised) and handles 20+ concurrent users. Still effectively silent.

For larger deployments, see our hardware guide and model comparison.

What You Need Before You Start

Before plugging in the Mac Mini:

  • A static IP or hostname on your office network so users can reach the system
  • Power in a location with good airflow (Mac Mini runs cool but should not be enclosed in a sealed cabinet)
  • Network connection — wired Ethernet preferred for stability
  • Optional: UPS for protection against power interruptions
  • Optional: Backup destination for the document index and configuration

You do not need:

  • A monitor (headless setup is fine)
  • A keyboard (after initial setup)
  • A specialised server room

The Mac Mini can run on a shelf in the IT room, behind reception, or in any reasonably ventilated space.

Step 1: Install Ollama

After receiving the Mac Mini, do the initial macOS setup, then install Ollama.

Method 1 — Direct download:

Visit ollama.com/download and download the macOS installer. Run it and follow the prompts. Ollama installs as a background service that starts automatically.

Method 2 — Homebrew:

If you use Homebrew:

brew install ollama

This installs Ollama and the supporting tools.

Verify installation:

ollama --version

Should print the version number.

Step 2: Download Your First Model

Ollama maintains a library of pre-built models ready to download. For small office deployments, the most useful starting models are:

Model Command Size Use Case
Gemma 4 9B ollama pull gemma2:9b ~5GB General office tasks
Llama 3 8B ollama pull llama3.1:8b ~5GB Strong reasoning
Mistral 7B ollama pull mistral:7b ~4GB Multilingual or general
Gemma 4 27B ollama pull gemma2:27b ~16GB Stronger tasks (48GB Mac Mini)
Llama 3 70B (Q4) ollama pull llama3.1:70b ~40GB Largest model that fits

Start with Gemma 4 9B or Llama 3 8B for testing. They are smaller, faster, and comfortable to experiment with.

ollama pull gemma2:9b

This downloads the model (5GB or so). The first download takes 5-15 minutes depending on your internet connection.

Step 3: First Test

Once the model is downloaded:

ollama run gemma2:9b

This starts an interactive chat with the model. Try a question:

> What's the difference between Llama 3 and Mistral?

You should get a response in 2-10 seconds depending on hardware. The model is now running entirely locally — no internet required after this initial download.

Exit with /bye or Ctrl+C.

Step 4: Configure for Network Access

By default, Ollama only accepts connections from localhost. For office use, you need to allow connections from other machines on the network.

Edit the launchd plist (on Mac):

Create or edit ~/Library/LaunchAgents/com.ollama.server.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.ollama.server</string>
  <key>EnvironmentVariables</key>
  <dict>
    <key>OLLAMA_HOST</key>
    <string>0.0.0.0:11434</string>
  </dict>
  <key>ProgramArguments</key>
  <array>
    <string>/usr/local/bin/ollama</string>
    <string>serve</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
  <key>KeepAlive</key>
  <true/>
</dict>
</plist>

Load it:

launchctl load ~/Library/LaunchAgents/com.ollama.server.plist

Now Ollama listens on the Mac Mini's IP address, port 11434. Other machines on the office network can reach it.

Test from another machine on the network:

curl http://[mac-mini-ip]:11434/api/generate -d '{
  "model": "gemma2:9b",
  "prompt": "Hello"
}'

You should receive a streaming response. Ollama is now serving the office.

Step 5: Set Up a User Interface

The raw Ollama API works for developers but office users want a friendly interface. Options:

Option A: Open WebUI

Open WebUI is a self-hosted chat interface that works with Ollama. It runs on the same Mac Mini or another machine.

docker run -d --name open-webui \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://[mac-mini-ip]:11434 \
  -v open-webui:/app/backend/data \
  --restart always \
  ghcr.io/open-webui/open-webui:main

(Requires Docker; install via brew install docker --cask.)

Users access http://[mac-mini-ip]:3000 and get a ChatGPT-like interface.

Option B: Custom Web Interface

For more polished deployments, AIRGAP LLM typically deploys a custom web interface with:

  • Single sign-on integration (via your existing AD/Google Workspace)
  • Branded UI matching the firm
  • Audit logging of all queries
  • Document upload for ad-hoc RAG
  • Permission-controlled document libraries

Option C: Direct API Integration

For integration into existing apps (your firm's intranet, document management system, etc.), use the Ollama API directly. It's OpenAI-compatible, so existing integrations using the OpenAI SDK can point at the Mac Mini with a single URL change.

Step 6: Add RAG Over Your Documents

For most small offices, the real value comes from RAG over your existing documents — see our detailed RAG architecture guide.

A minimal RAG setup on the Mac Mini:

Install ChromaDB

pip3 install chromadb

Ingest Documents

Create a simple ingestion script that:

  1. Reads files from a designated folder (or your document management system)
  2. Extracts text (PyPDF2 for PDFs, python-docx for Word)
  3. Splits into chunks (~500 words with 50-word overlap)
  4. Generates embeddings using a local model (Ollama can serve embedding models too)
  5. Stores in ChromaDB

For 1,000-10,000 documents, ingestion takes a few hours on the Mac Mini. Subsequent updates are incremental and fast.

Query Flow

When a user asks a question:

  1. The question is embedded
  2. ChromaDB returns the top 5-10 most relevant chunks
  3. Those chunks are added to the prompt as context
  4. Ollama generates an answer grounded in the chunks
  5. The answer includes citations to the source documents

This is the same pattern as enterprise RAG deployments, just running on hardware that fits on a desk.

Step 7: Set Up Backup and Monitoring

Production deployment, even on a small scale, needs:

Backup

The Mac Mini's content to back up:

  • Ollama models (/Users/[user]/.ollama/)
  • ChromaDB vector database
  • Configuration files
  • Any custom integration code

Time Machine to an external drive is sufficient for small offices. For larger deployments, consider an automated network backup.

Monitoring

At minimum:

  • Check Ollama service is running daily
  • Monitor disk space (models + index can grow)
  • Track memory usage during peak hours
  • Log query counts and response times

For small offices, a simple status page or weekly check-in is usually sufficient.

Updates

  • macOS updates: standard schedule (deferred a week or two from release for stability)
  • Ollama updates: every 1-3 months as new versions add features and performance improvements
  • Model updates: when significantly better models become available (typically 2-4 times per year for the major families)

Performance Expectations

What to expect from a Mac Mini M4 Pro 48GB running Gemma 4 27B:

Metric Typical Value
First token latency 1-3 seconds
Tokens per second 25-40
Concurrent users 5-15 (depending on query complexity)
Power consumption 30-60W average
Noise Effectively silent
Heat output Warm to touch under load

For most small office workloads, users perceive response times as "fast enough" — comparable to or better than waiting for ChatGPT to respond.

What Can Go Wrong (And How to Handle It)

Issue: Slow responses

Likely causes: Model too large for the hardware, multiple users querying simultaneously, document context too long.

Fixes: Try a smaller model variant; reduce the number of retrieved chunks in RAG; check that the Mac Mini isn't running other heavy processes.

Issue: Inconsistent answers

Likely causes: Poor document quality, RAG retrieval returning irrelevant chunks, prompt not specific enough.

Fixes: Improve document quality; tune retrieval (more chunks, re-ranking); refine prompts.

Issue: Network access not working

Likely causes: OLLAMA_HOST not set, firewall blocking port 11434, Mac Mini sleeping.

Fixes: Verify environment variable; check firewall; configure Energy Saver to prevent sleep.

Issue: Model produces wrong information

Likely causes: Question doesn't match RAG content, model hallucinating beyond context, ambiguous query.

Fixes: Require citations for every claim; improve prompts; train users to phrase queries more specifically.

When to Move Beyond Mac Mini

The Mac Mini setup works well for small offices but has limits. Signs you need more:

  • Consistent slowness during peak hours (suggests insufficient compute)
  • Need for larger models (70B+ requires more memory)
  • Growing concurrent user count (15-20+ simultaneous queries)
  • Mission-critical reliability (single Mac Mini = single point of failure)

For these scenarios, options include:

  • Mac Studio (64-192GB unified memory, more compute, still desk-friendly)
  • Add a second Mac Mini behind a load balancer
  • Dedicated GPU server (RTX A5000/A6000 in a 1U or workstation form factor)

The migration path is gradual — your models, document corpus, and integration code carry over.

A Realistic Small Office Deployment

A typical Cremorne accounting practice (22 staff) deployment:

  • Hardware: Mac Mini M4 Pro 48GB + 2TB external SSD for backups: $4,800
  • Setup: 3-day deployment engagement (install, RAG over policy + precedent corpus, train staff): $12,000
  • Year 1 support: Monthly check-ins, updates, troubleshooting: $1,500/month = $18,000
  • Year 1 total: $34,800

By month 4, the firm reports staff are saving an average of 30 minutes per person per day on document search, drafting, and policy queries. For 22 staff, that's about 165 hours per month — equivalent to one full-time-equivalent in recovered productivity.

The Mac Mini sits in the IT cupboard, runs silently, and powers a meaningful AI capability for less than the cost of a single ChatGPT Enterprise tier covering the same headcount.

The AIRGAP LLM Perspective

AIRGAP LLM deploys Mac Mini-based private AI for small offices across Melbourne. Our typical small-office deployment includes:

  • Hardware procurement and setup (Mac Mini specified for the firm's use case)
  • Ollama installation and network configuration
  • Document ingestion and RAG setup
  • Custom interface (or Open WebUI for simpler deployments)
  • Staff training (usually a 1-hour session)
  • Ongoing support arrangement

For Melbourne-based small offices considering private AI deployment, contact AIRGAP LLM for a free assessment — including a recommended hardware configuration and itemised pricing for your specific use case.

Frequently Asked Questions

Can a Mac Mini really run a useful local LLM?

Yes. A Mac Mini M4 Pro with 24GB unified memory comfortably runs Gemma 4 9B or Llama 3 8B — modern open-source models that handle document summarisation, search, drafting, and Q&A very well. A Mac Mini M4 Pro with 48GB can run larger models (Gemma 4 27B, Llama 3 13B). For a small office of 5-20 users, this is genuinely sufficient hardware. The Mac Mini's unified memory architecture is particularly well-suited to LLM workloads.

Why Ollama rather than other tools?

Ollama is the most mature tool for running LLMs locally in 2026. It handles model downloading, format conversion, quantisation, and serving — all behind a simple command-line interface. It works on Mac, Linux, and Windows. It exposes a standard HTTP API that any application can call. For small offices, Ollama dramatically reduces the complexity of running local AI compared to managing models manually.

How long does it take to set up Ollama on a Mac Mini?

For a basic single-user setup, you can have Ollama running with a model in 15-20 minutes after the hardware arrives. For a small office deployment serving multiple users with RAG over your documents, plan 1-2 days of setup work — most of it document ingestion and configuration rather than Ollama itself. For a production-quality small office deployment with backup, monitoring, and integration with your team's workflows, 1-2 weeks is typical.

Does Ollama on a Mac Mini work without an internet connection?

Yes, once models are downloaded. Ollama needs internet during initial model download (typically 5-50GB depending on model). After that, inference runs entirely offline. For air-gapped deployments, models can also be transferred to the Mac Mini via external storage and loaded manually. The system can operate indefinitely with zero internet connectivity.

What if my small office needs to grow beyond 20 users?

The Mac Mini setup scales gracefully. Up to about 20 concurrent users on a single Mac Mini M4 Pro 48GB, depending on usage intensity. Beyond that, you can either: (1) upgrade to a Mac Studio M-series with 64-192GB unified memory; (2) add additional Mac Mini units behind a load balancer; or (3) move to a dedicated GPU server. The investment in Ollama-based deployment carries over — the models, document corpus, and integration code remain the same.

SA

Sasa Abe

Co-Founder, AIRGAP LLM

Software engineer specialising in privacy-focused AI architecture, RAG systems, and local LLM deployment for data-sensitive organisations.

About our team →

Want to See How This Works for Your Firm?

We'll walk you through a deployment that fits your setup — your documents, your infrastructure, your compliance requirements. No sales pitch.

Request a Consultation

Or email us directly at hello@airgapllm.com.au