About
This post documents how to build an agent that is specialized in answering queries from man pages. It is a good introduction for anyone new to AI development, or for anyone who wants to learn more about how to use llm.rb.
For this example we will implement one agent and two tools: an interface to man(1) for reading man pages, and an interface to apropos(1) for searching man pages. The agent will persist to a database with the builtin ActiveRecord support provided by llm.rb.
Background
What is a tool?
A tool has a name, a description, and an optional set of parameters. It also has an implementation, which is a method that is called by the model when it determines the user's query is best served by calling a tool. A tool returns a value, and that value is given back to the model after the tool has run. The tool does not produce the final answer -- the model reads the tool's output and decides how to incorporate it.
The following example is a simple tool that enables a model to read a file:
require "llm"
class ReadFile < LLM::Tool
name "read-file"
description "Read a file from disk"
parameter :path, String, "The file path"
required %i[path]
def call(path:)
{contents: File.read(path)}
end
end
llm = LLM.openai(key: ENV["OPENAI_SECRET"])
agent = LLM::Agent.new(llm, tools: [ReadFile])
puts agent.talk("What are the contents of README.md?").content
Explanation
name "read-file"anddescription "Read a file from disk"
The name and description tell the model what the tool does and when to use it.parameter :path, String, "The file path"
Declares a string parameter namedpath. The model fills this in when it calls the tool.required %i[path]
Makes thepathparameter required. If the model does not provide it, llm.rb raises an error.def call(path:)
The tool implementation. This is what runs when the model decides to call the tool. The return value is a hash, which the model receives as the tool result.
Why a man page agent?
A language model knows a lot about Unix commands, but it does not know
the exact version installed on your machine or the specific flags
available in your OS distribution. FreeBSD's pfctl has
options that do not exist on Linux's iptables. The
tar on OpenBSD may differ from the tar on
macOS. By giving the model access to man and
apropos, you ground its answers in the actual documentation
on your system rather than its training data.
Tools
Apropos
The Apropos tool provides an interface to the apropos(1) command. It is useful when the user asks a question like "how do I search files?" and the model needs to find which man page covers that topic.
require "shellwords"
class Apropos < LLM::Tool
name "apropos"
description "Search the man page index"
parameter :query, String, "Query to search"
required %i[query]
def call(query:)
output = `apropos #{query.shellescape}`
matches = output.lines.map(&:chomp).reject(&:empty?).first(10)
{query:, matches:}
end
end
Explanation
parameter :query, String, ...
Defines the query input for the tool.required %i[query]
Makes the query required.`apropos #{query.shellescape}`
Runsaproposand escapes the query before it reaches the shell.first(10)
Returns the first few matches. Man page indexes can be large, and a bounded result keeps the model's context from filling up with noise.
Man
The Man tool provides an interface to the man(1) command. Once the model knows which man page it needs, it calls this tool to read the content.
require "shellwords"
class Man < LLM::Tool
name "man"
description "Read a man page"
parameter :page, String, "The man page to read, such as ls or printf"
parameter :section, String, "The man page section, such as 1 or 5"
required %i[page]
def call(page:, section: nil)
args = [section, page].compact.map(&:shellescape).join(" ")
output = `MANPAGER=cat PAGER=cat MANWIDTH=80 man #{args}`
text = output.gsub(/\x08./, "").strip
{page:, section:, content: text[0, 12_000]}
end
end
Explanation
parameter :pageandparameter :section
Defines the page input and the optional section input. A section lets the model be precise, e.g. reading section 5 of a page instead of section 1.required %i[page]
Makes the page required.`MANPAGER=cat PAGER=cat MANWIDTH=80 man #{args}`
Forces plain text output instead of opening a pager. Without these environment variables,manwould run an interactive pager likelessor emit escape codes.output.gsub(/\x08./, "").strip
Man pages use backspace encoding for bold and underline formatting (character, backspace, character again). The\x08is the backspace byte. This regex strips those sequences so the model receives clean plain text.text[0, 12_000]
Returns a bounded slice of the page. Some man pages are very long, and a bounded slice prevents the tool output from overwhelming the model's context window.
Agent
The agent is implemented as an ActiveRecord model. It does not have to
be -- you can use LLM::Agent directly without a database --
but ActiveRecord provides persistence so the agent's conversation history
survives across restarts.
The agent has instructions (system prompt), a model, a set of tools, and a concurrency setting that decides how tools are executed. Our example executes tools on their own thread. Other concurrency options include async-task, fibers, ractors, and fork:
require "llm"
require "active_record"
require "llm/active_record"
class Agent < ApplicationRecord
acts_as_agent provider: :set_provider
model "gpt-5.4-mini"
instructions "Answer questions from local UNIX man pages."
tools Apropos, Man
concurrency :thread
private
def set_provider
LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
end
end
Explanation
acts_as_agent provider: :set_provider
PersistsLLM::Agentstate on the model and lets the record resolve its own provider. Thedatacolumn stores the serialized runtime.model "gpt-5.4-mini"andinstructions "..."
Define the default model and system instructions for the agent.tools Apropos, Man
The two local tools the agent can call. They must be defined before the agent class so the constants are available at load time.concurrency :thread
Runs tool work with threads. When the model calls Apropos then Man in sequence, each call runs on its own thread without blocking the agent loop.
Migration
For the ActiveRecord-backed agent, we need a table to store the
serialized state. The only requirement is a single data
column. It could be jsonb where supported, but for
simplicity and portability we use a text column:
create_table :agents do |t|
t.text :data
t.timestamps
end
Explanation
:data
Stores the serialized agent runtime. Theacts_as_agentwrapper automatically saves and restores the agent state here.:timestamps
Gives us the usual ActiveRecord created and updated timestamps.
Usage
The following example creates an agent and asks three questions. Each
talk call persists automatically.
##
# Create our agent
agent = Agent.create!
##
# First question - persists automatically
# The agent searches for tar(1), reads the man page, and answers
# based on the local system's documentation.
puts agent.talk("How do I extract a tar archive?").content
##
# Second question - persists automatically
# The agent may use the same man page from its context or search
# for a new one if the previous content has been evicted.
puts agent.talk("What about gzipped tar archives?").content
##
# Third question - persists automatically
# pf.conf is a FreeBSD-specific file. The model's training data
# may cover it, but the agent reads the local man page to be sure.
puts agent.talk("How do I block incoming traffic with pf.conf?").content
Explanation
agent.talk("How do I extract a tar archive?")
The agent receives the question. It callsApropos("tar")to find the relevant man page, then callsMan("tar", "1")to read it, then answers based on the local system's documentation.agent.talk("What about gzipped tar archives?")
The agent may use the same man page from its context or search for a new one if the previous content has been evicted.agent.talk("How do I block incoming traffic with pf.conf?")
The agent callsApropos("pf.conf"), gets backpf.conf(5), callsMan("pf.conf", "5"), and answers from the local man page content.agent.talk()persists automatically
Theacts_as_agentwrapper saves the agent state to thedatacolumn after each turn, so conversation history survives restarts.
Robert
The agent we built runs on CRuby with ActiveRecord and a database, but the same pattern powers standalone applications built with mruby-llm. Robert is a FreeBSD documentation assistant that compiles into a ~2MB standalone binary. No Ruby installation, no database, no Rails -- just a statically linked mruby program.
Robert uses the same building blocks: LLM::Agent, a
ManPage tool for reading man pages, and a
ManSearch tool for apropos. It adds a terminal
UI built on termbox2, tool confirmation for sensitive operations like
reading arbitrary files, and a DeepSeek backend. The binary is built from
an mruby build configuration and distributed as a single executable. See
the robert repository and
website for more.
Conclusion
The same approach described here can be applied to other things like
internal documentation, log files, etc. Further topics worth exploring
include LLM::Context for manual tool loops,
LLM::Skill for packaging reusable instructions, and
LLM::MCP for connecting to remote tool servers.