localLLM provides
an easy-to-use interface to run large language models (LLMs) directly in
R. It uses the performant llama.cpp library as the backend
and allows you to generate text and analyze data with LLMs. Everything
runs locally on your own machine, completely free, with reproducibility
by default.
Getting started requires two simple steps: installing the R package and downloading the backend C++ library.
The install_localLLM() function automatically detects
your platform and downloads the appropriate pre-compiled library. GPU
acceleration is selected automatically when a compatible GPU driver is
detected:
| Platform | GPU backend | Detection method |
|---|---|---|
| macOS (Apple Silicon) | Metal | always enabled |
| macOS (Intel) | Metal | always enabled |
| Windows (x86-64) | Vulkan | vulkan-1.dll present in System32 |
| Linux (x86-64) | Vulkan | Vulkan loader + hardware ICD file present |
On Windows and Linux, if no GPU driver is found, the CPU build is installed automatically.
The simplest way to get started is with
quick_llama():
#> The capital of France is Paris.
quick_llama() is a high-level wrapper designed for
convenience. On first run, it automatically downloads and caches the
default model (Llama-3.2-3B-Instruct-Q5_K_M.gguf).
A common use case is classifying text. Here’s a sentiment analysis example:
response <- quick_llama(
'Classify the sentiment of the following tweet into one of two
categories: Positive or Negative.
Tweet: "This paper is amazing! I really like it."'
)
cat(response)#> The sentiment of this tweet is Positive.
quick_llama() can handle different types of input:
# Process multiple prompts at once
prompts <- c(
"What is 2 + 2?",
"Name one planet in our solar system.",
"What color is the sky?"
)
responses <- quick_llama(prompts)
print(responses)#> [1] "2 + 2 equals 4."
#> [2] "One planet in our solar system is Mars."
#> [3] "The sky is typically blue during the day."
The localLLM backend only supports models in the GGUF
format. You can find thousands of GGUF models on Hugging Face:
.gguf file# From Hugging Face URL
response <- quick_llama(
"Explain quantum physics simply",
model_path = "https://huggingface.co/unsloth/gemma-3-4b-it-qat-GGUF/resolve/main/gemma-3-4b-it-qat-Q5_K_M.gguf"
)
# From local file
response <- quick_llama(
"Explain quantum physics simply",
model_path = "/path/to/your/model.gguf"
)
# From cache (name fragment)
response <- quick_llama(
"Explain quantum physics simply",
model_path = "Llama-3.2"
)#> name size_bytes modified
#> 1 Llama-3.2-3B-Instruct-Q5_K_M.gguf 2322153920 2025-12-05 20:01:18
#> 2 gemma-3-4b-it-qat-Q5_K_M.gguf 2829698176 2025-12-14 19:21:11
Control the output with various parameters: