Back to Blog
    Guide·6 min read·April 15, 2026

    Run OpenClaw with a Local Model Using Ollama on BlueStacks AI

    Run OpenClaw with a Local Model Using Ollama on BlueStacks AI

    If you want to use Ollama and run a local model on your own system instead of relying on a cloud API key, you can. On Windows, the setup involves running Ollama locally, creating a tunnel into OpenClaw VM, and pointing OpenClaw to that model in its config.

    Follow the steps below.

    1. Install Ollama

    There are two ways to install Ollama: via the command line or using an installer package from the official website.

    Install from the command line:

    powershell
    irm https://ollama.com/install.ps1 | iex

    2. Fetch and Run the Model

    Taking the Google Gemma 4 model as an example — run the command below. Remember to install a model that supports Tool Calling. We deployed this on a device equipped with 32 GB of GPU memory; please select an appropriate model based on the actual specifications of your device.

    bash
    ollama run gemma4:e4b

    Gemma 4 Model Variants

    Model VersionTotal ParamsActive ParamsVRAM (BF16/Full)VRAM (4-bit Quantized)Recommended GPU (Quantized)
    Gemma 4 E2B~5B2.3B~11 GB~3–4 GBRTX 3060 (8G) / Mobile Devices
    Gemma 4 E4B~8B4.5B~17 GB~5–6 GBRTX 4060 (8G) / RTX 3060
    Gemma 4 26B MoE26B3.8B~52 GB~14–16 GBRTX 3090 / 4090 (24G)
    Gemma 4 31B Dense31B31B~64 GB~18–20 GBRTX 4090 (24G) / RTX 5090

    Once it starts running, Ollama will connect to the server and begin fetching the model's layers. After the download is complete, it will automatically verify the model and load it onto the GPU. Once deployment is complete, the terminal will automatically switch to the >>> prompt, and you can start chatting.

    Terminal output showing Ollama fetching and running Gemma 4
    Terminal output showing Ollama fetching and running Gemma 4

    If the response is particularly slow, try switching to a smaller model and verify that the CUDA driver is installed correctly. If the driver is not installed properly, the graphics card will not be recognized and the model will run on the CPU, which will be slow.

    3. Configure OpenClaw to Use the Local Model

    3.1 Tunnel Connection

    Ollama listens on 127.0.0.1:11434 by default, so we need to set up an SSH tunnel to the OpenClaw virtual machine. Run the command in Windows:

    bash
    ssh -i ~\.ssh\id_openclaw -p 2222 -N -R 11434:127.0.0.1:11434 openclaw@localhost

    3.2 Update the Configuration

    Modify the openclaw.json configuration. The main changes are as follows:

    json
    {
      "agents": {
        "defaults": {
          "model": {
            "primary": "host_ollama/gemma4:e4b",
            "fallbacks": []
          },
          "models": {}
        }
      }
    }
    {
      "workspace": "/home/openclaw/.openclaw/workspace",
      "compaction": {
        "mode": "safeguard",
        "reserveTokensFloor": 20000
      },
      "maxConcurrent": 4,
      "subagents": {
        "maxConcurrent": 8
      }
    }
    "models": {
      "mode": "merge",
      "providers": {
        "host_ollama": {
          "baseUrl": "http://127.0.0.1:11434/v1",
          "apiKey": "ollama",
          "api": "openai-responses",
          "models": [
            {
              "id": "gemma4:e4b",
              "name": "Gemma 4 e4B (Host)",
              "api": "openai-responses",
              "input": [
                "text"
              ],
              "reasoning": false,
              "cost": {
                "input": 0,
                "output": 0,
                "cacheRead": 0,
                "cacheWrite": 0
              },
              "contextWindow": 32768,
              "maxTokens": 8192
            }
          ]
        }
      }
    }

    After making changes, restart the OpenClaw gateway for the configuration to take effect:

    bash
    openclaw gateway restart

    After restarting the OpenClaw Gateway, the local LLM model should be ready for normal use on OpenClaw.

    OpenClaw dashboard showing local Ollama model in use
    OpenClaw dashboard showing local Ollama model in use

    4. Some Issues You May Encounter

    When sending messages via OpenClaw, the model may remain unresponsive for an extended period. This is likely due to insufficient GPU memory, as OpenClaw sends a large amount of context data by default, which consumes significant computational resources. We can use a simple command to test whether the model is responding properly:

    bash
    curl http://127.0.0.1:11434/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "gemma4:e4b",
        "messages": [
          {
            "role": "user",
            "content": "System Test: If you can see this message, please reply with the number 1."
          }
        ],
        "stream": false
      }'

    If everything is working properly, a long string of JSON will be displayed. If you can find a response with "content":"1", it means the computing tunnel has been successfully established and the model is functioning normally.

    Successful curl test showing JSON response from Ollama
    Successful curl test showing JSON response from Ollama

    If there is no response for an extended period, you can use the ollama ps command to check the current status of the model.

    Ollama ps showing memory allocation errors
    Ollama ps showing memory allocation errors

    Normally, you should be able to see the model currently running, as well as whether it is running on the GPU or CPU. Ollama occasionally has a bug where it fails to detect the GPU, causing the model to run on the CPU, which results in significant lag. This requires closing and restarting Ollama or the entire computer.

    Ollama ps showing model running on 100% GPU
    Ollama ps showing model running on 100% GPU

    FAQs

    Can I use Ollama with OpenClaw on BlueStacks AI?

    Yes. BlueStacks AI allows you to use a local Ollama model with OpenClaw.

    Which model can I start with?

    A good starting example is Gemma 4 e4b, as long as your hardware can support it.

    Does the model need to support tool calling?

    Yes. You should use a model that supports tool calling.

    Why is my Ollama model slow inside OpenClaw?

    The most common reasons are insufficient GPU memory or the model falling back to the CPU instead of the GPU.

    What should I check if the model is unresponsive?

    Test the local endpoint with curl, then use ollama ps to confirm whether the model is running and whether it is on GPU or CPU.