Package: localLLM 1.3.1

Yaosheng Xu

localLLM: Running Local LLMs with 'llama.cpp' Backend

Provides R bindings to the 'llama.cpp' library for running large language models. The package uses a lightweight architecture where the C++ backend library is downloaded at runtime rather than bundled with the package. Package features include text generation, reproducible generation, and parallel inference.

Authors:Eddie Yang [aut], Yaosheng Xu [aut, cre]

localLLM_1.3.1.tar.gz
localLLM_1.3.1.zip(r-4.7)localLLM_1.3.1.zip(r-4.6)localLLM_1.3.1.zip(r-4.5)
localLLM_1.3.1.tgz(r-4.6-x86_64)localLLM_1.3.1.tgz(r-4.6-arm64)localLLM_1.3.1.tgz(r-4.5-x86_64)localLLM_1.3.1.tgz(r-4.5-arm64)
localLLM_1.3.1.tar.gz(r-4.7-arm64)localLLM_1.3.1.tar.gz(r-4.7-x86_64)localLLM_1.3.1.tar.gz(r-4.6-arm64)localLLM_1.3.1.tar.gz(r-4.6-x86_64)
localLLM_1.3.1.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
localLLM/json (API)

# Install 'localLLM' in R:
install.packages('localLLM', repos = c('https://eddieyang211.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/eddieyang211/localllm/issues

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

Conda:

cpp

7.59 score 9 stars 18 scripts 216 downloads 30 exports 7 dependencies

Last updated from:a017b26b66. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK163
linux-devel-x86_64OK132
source / vignettesOK207
linux-release-arm64OK159
linux-release-x86_64OK132
macos-release-arm64OK186
macos-release-x86_64OK287
macos-oldrel-arm64OK211
macos-oldrel-x86_64OK364
windows-develOK110
windows-releaseOK94
windows-oldrelOK96
wasm-releaseOK125

Exports:annotation_sink_csvapply_chat_templateapply_gemma_chat_templatebackend_freebackend_initcompute_confusion_matricescontext_createdetokenizedocument_enddocument_startdownload_modelexploregenerategenerate_parallelget_lib_pathget_model_cache_dirhardware_profileinstall_localLLMintercoder_reliabilitylib_is_installedlist_cached_modelslist_ollama_modelsmodel_loadquick_llamaquick_llama_resetset_hf_tokensmart_chat_templatetokenizetokenize_testvalidate

Dependencies:curldigestjsonliteR.methodsS3R.ooR.utilsRcpp

Reproducible Output
Deterministic Generation by Default | Seed Control for Stochastic Generation | Input/Output Hash Verification | Hashes with explore() | Automatic Documentation | Best Practices for Reproducible Research | 1. Always Set Seeds | 2. Log Your Environment | 3. Use Document Functions for Audit Trails | 4. Share Hashes for Verification | 5. Version Control Your Models | Summary

Last update: 2026-05-05
Started: 2025-12-12

Frequently Asked Questions
Installation Issues | "Backend library is not loaded" error | Installation fails on my platform | "Library already installed" but functions don't work | Model Download Issues | "Download lock" or "Another download in progress" error | Download times out or fails | "Model not found" when using cached model | Private Hugging Face model fails | Memory Issues | R crashes when loading a model | "Memory check failed" warning | Context creation fails with large n_ctx | GPU Issues | GPU not being used | GPU runs out of memory | Generation Issues | Backend prints too many log messages | Output is garbled or nonsensical | Output contains strange tokens like <|eot_id|> | Generation stops too early | Same prompt gives different results | Performance Issues | Generation is very slow | Parallel processing isn't faster | Compatibility Issues | "GGUF format required" error | Model works in Ollama but not localLLM | Common Error Messages | Getting Help | Quick Reference

Last update: 2026-04-26
Started: 2025-12-12

Get Started with localLLM
Installation | Step 1: Install the R package | Step 2: Install the backend library | Your First LLM Query | Text Classification Example | Processing Multiple Prompts | Finding and Using Models | GGUF Format | Loading Different Models | Managing Cached Models | Customizing Generation | Next Steps

Last update: 2026-04-26
Started: 2025-12-12

Parallel Processing
Why Parallel Processing? | Using generate_parallel() | Basic Usage | Progress Tracking | Text Classification Example | Sequential vs Parallel Comparison | Sequential (For Loop) | Parallel | Benchmark: Multiple Models | Using quick_llama() for Batches | Performance Considerations | Context Size and n_seq_max | Memory Usage | Batch Size Recommendations | Error Handling | Complete Workflow | Summary | Tips | Next Steps

Last update: 2026-04-20
Started: 2025-12-12

Basic Text Generation
The Core Workflow | Step 1: Loading a Model | Model Loading Options | Step 2: Creating a Context | Context Parameters | Step 3: Formatting Prompts with Chat Templates | Multi-Turn Conversations | Step 4: Generating Text | Generation Parameters | Complete Example | Tokenization | Tips and Best Practices | 1. Reuse Models and Contexts | 2. Size Your Context Appropriately | 3. Controlling Log Output (verbosity) | 4. Use GPU When Available | Next Steps

Last update: 2026-04-13
Started: 2025-12-12

Model Comparison & Validation
The explore() Function | Creating Structured Prompts | Template Builder Format | Running the Comparison | Viewing Results | Validation Against Ground Truth | Confusion Matrices | Reliability Metrics | Alternative Prompt Formats | Character Vector | Custom Function | Model-Specific Prompts | Computing Metrics Separately | Intercoder Reliability | Complete Example | Summary | Next Steps

Last update: 2026-04-06
Started: 2025-12-12

Ollama Integration
Discovering Ollama Models | Loading Ollama Models | By Model Name | By Tag | By SHA256 Prefix | Interactive Selection | Using with quick_llama() | Ollama Reference Trigger Rules | Common Workflows | Check Available Models First | Load Specific Model | Model Comparison with Ollama | Ollama Directory Structure | Troubleshooting | Model Not Found | Ollama Not Installed | Multiple Matches | Benefits of Ollama Integration | Complete Example | Summary | Next Steps

Last update: 2026-02-24
Started: 2025-12-12

Readme and manuals

Help Manual

Help pageTopics
R Interface to llama.cpp with Runtime Library LoadinglocalLLM-package localLLM
AG News classification sampleag_news_sample
Create a CSV sink for streaming annotation chunksannotation_sink_csv
Apply Chat Template to Format Conversationsapply_chat_template
Apply Gemma-Compatible Chat Templateapply_gemma_chat_template
Free localLLM backendbackend_free
Initialize localLLM backendbackend_init
Compute confusion matrices from multi-model annotationscompute_confusion_matrices
Create Inference Context for Text Generationcontext_create
Convert Token IDs Back to Textdetokenize
Finish automatic run documentationdocument_end
Start automatic run documentationdocument_start
Download a model manuallydownload_model
Compare multiple LLMs over a shared set of promptsexplore
Generate Text Using Language Model Contextgenerate
Generate Text in Parallel for Multiple Promptsgenerate_parallel
Get Backend Library Pathget_lib_path
Get the model cache directoryget_model_cache_dir
Inspect detected hardware resourceshardware_profile
Install localLLM Backend Libraryinstall_localLLM
Intercoder reliability for LLM annotationsintercoder_reliability
Check if Backend Library is Installedlib_is_installed
List cached models on disklist_cached_models
List GGUF models managed by Ollamalist_ollama_models
Load Language Model with Automatic Download Supportmodel_load
Get All GGUF Metadata from a Loaded Modelmodel_metadata
Quick LLaMA Inferencequick_llama
Reset quick_llama statequick_llama_reset
Configure Hugging Face access tokenset_hf_token
Smart Chat Template Applicationsmart_chat_template
Convert Text to Token IDstokenize
Test tokenize function (debugging)tokenize_test
Validate model predictions against gold labels and peer agreementvalidate