How to build a man page agent

19 Apr 2026

About

This post documents how to build an agent that is specialized in answering queries from man pages. It is a good introduction for anyone new to AI development, or for anyone who wants to learn more about how to use llm.rb.

For this example we will implement one agent and two tools: an interface to man(1) for reading man pages, and an interface to apropos(1) for searching man pages. The agent will persist to a database with the builtin ActiveRecord support provided by llm.rb.

Background

What is a tool?

A tool has a name, a description, and an optional set of parameters. It also has an implementation, which is a method that is called by the model when it determines the user's query is best served by calling a tool. A tool returns a value, and that value is given back to the model after the tool has run. The tool does not produce the final answer -- the model reads the tool's output and decides how to incorporate it.

The following example is a simple tool that enables a model to read a file:

require "llm"

class ReadFile < LLM::Tool
  name "read-file"
  description "Read a file from disk"
  parameter :path, String, "The file path"
  required %i[path]

  def call(path:)
    {contents: File.read(path)}
  end
end

llm = LLM.openai(key: ENV["OPENAI_SECRET"])
agent = LLM::Agent.new(llm, tools: [ReadFile])
puts agent.talk("What are the contents of README.md?").content

Explanation

name "read-file" and description "Read a file from disk"
The name and description tell the model what the tool does and when to use it.
parameter :path, String, "The file path"
Declares a string parameter named path. The model fills this in when it calls the tool.
required %i[path]
Makes the path parameter required. If the model does not provide it, llm.rb raises an error.
def call(path:)
The tool implementation. This is what runs when the model decides to call the tool. The return value is a hash, which the model receives as the tool result.

Why a man page agent?

A language model knows a lot about Unix commands, but it does not know the exact version installed on your machine or the specific flags available in your OS distribution. FreeBSD's pfctl has options that do not exist on Linux's iptables. The tar on OpenBSD may differ from the tar on macOS. By giving the model access to man and apropos, you ground its answers in the actual documentation on your system rather than its training data.

Tools

Apropos

The Apropos tool provides an interface to the apropos(1) command. It is useful when the user asks a question like "how do I search files?" and the model needs to find which man page covers that topic.

require "shellwords"

class Apropos < LLM::Tool
  name "apropos"
  description "Search the man page index"
  parameter :query, String, "Query to search"
  required %i[query]

  def call(query:)
    output = `apropos #{query.shellescape}`
    matches = output.lines.map(&:chomp).reject(&:empty?).first(10)
    {query:, matches:}
  end
end

Explanation

parameter :query, String, ...
Defines the query input for the tool.
required %i[query]
Makes the query required.
`apropos #{query.shellescape}`
Runs apropos and escapes the query before it reaches the shell.
first(10)
Returns the first few matches. Man page indexes can be large, and a bounded result keeps the model's context from filling up with noise.

Man

The Man tool provides an interface to the man(1) command. Once the model knows which man page it needs, it calls this tool to read the content.

require "shellwords"

class Man < LLM::Tool
  name "man"
  description "Read a man page"
  parameter :page, String, "The man page to read, such as ls or printf"
  parameter :section, String, "The man page section, such as 1 or 5"
  required %i[page]

  def call(page:, section: nil)
    args = [section, page].compact.map(&:shellescape).join(" ")
    output = `MANPAGER=cat PAGER=cat MANWIDTH=80 man #{args}`
    text = output.gsub(/\x08./, "").strip
    {page:, section:, content: text[0, 12_000]}
  end
end

Explanation

parameter :page and parameter :section
Defines the page input and the optional section input. A section lets the model be precise, e.g. reading section 5 of a page instead of section 1.
required %i[page]
Makes the page required.
`MANPAGER=cat PAGER=cat MANWIDTH=80 man #{args}`
Forces plain text output instead of opening a pager. Without these environment variables, man would run an interactive pager like less or emit escape codes.
output.gsub(/\x08./, "").strip
Man pages use backspace encoding for bold and underline formatting (character, backspace, character again). The \x08 is the backspace byte. This regex strips those sequences so the model receives clean plain text.
text[0, 12_000]
Returns a bounded slice of the page. Some man pages are very long, and a bounded slice prevents the tool output from overwhelming the model's context window.

Agent

The agent is implemented as an ActiveRecord model. It does not have to be -- you can use LLM::Agent directly without a database -- but ActiveRecord provides persistence so the agent's conversation history survives across restarts.

The agent has instructions (system prompt), a model, a set of tools, and a concurrency setting that decides how tools are executed. Our example executes tools on their own thread. Other concurrency options include async-task, fibers, ractors, and fork:

require "llm"
require "active_record"
require "llm/active_record"

class Agent < ApplicationRecord
  acts_as_agent provider: :set_provider

  model "gpt-5.4-mini"
  instructions "Answer questions from local UNIX man pages."
  tools Apropos, Man
  concurrency :thread

  private

  def set_provider
    LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
  end
end

Explanation

acts_as_agent provider: :set_provider
Persists LLM::Agent state on the model and lets the record resolve its own provider. The data column stores the serialized runtime.
model "gpt-5.4-mini" and instructions "..."
Define the default model and system instructions for the agent.
tools Apropos, Man
The two local tools the agent can call. They must be defined before the agent class so the constants are available at load time.
concurrency :thread
Runs tool work with threads. When the model calls Apropos then Man in sequence, each call runs on its own thread without blocking the agent loop.

Migration

For the ActiveRecord-backed agent, we need a table to store the serialized state. The only requirement is a single data column. It could be jsonb where supported, but for simplicity and portability we use a text column:

create_table :agents do |t|
  t.text :data
  t.timestamps
end

Explanation

:data
Stores the serialized agent runtime. The acts_as_agent wrapper automatically saves and restores the agent state here.
:timestamps
Gives us the usual ActiveRecord created and updated timestamps.

Usage

The following example creates an agent and asks three questions. Each talk call persists automatically.

##
# Create our agent
agent = Agent.create!

##
# First question - persists automatically
# The agent searches for tar(1), reads the man page, and answers
# based on the local system's documentation.
puts agent.talk("How do I extract a tar archive?").content

##
# Second question - persists automatically
# The agent may use the same man page from its context or search
# for a new one if the previous content has been evicted.
puts agent.talk("What about gzipped tar archives?").content

##
# Third question - persists automatically
# pf.conf is a FreeBSD-specific file. The model's training data
# may cover it, but the agent reads the local man page to be sure.
puts agent.talk("How do I block incoming traffic with pf.conf?").content

Explanation

agent.talk("How do I extract a tar archive?")
The agent receives the question. It calls Apropos("tar") to find the relevant man page, then calls Man("tar", "1") to read it, then answers based on the local system's documentation.
agent.talk("What about gzipped tar archives?")
The agent may use the same man page from its context or search for a new one if the previous content has been evicted.
agent.talk("How do I block incoming traffic with pf.conf?")
The agent calls Apropos("pf.conf"), gets back pf.conf(5), calls Man("pf.conf", "5"), and answers from the local man page content.
agent.talk() persists automatically
The acts_as_agent wrapper saves the agent state to the data column after each turn, so conversation history survives restarts.

Robert

The agent we built runs on CRuby with ActiveRecord and a database, but the same pattern powers standalone applications built with mruby-llm. Robert is a FreeBSD documentation assistant that compiles into a ~2MB standalone binary. No Ruby installation, no database, no Rails -- just a statically linked mruby program.

Robert uses the same building blocks: LLM::Agent, a ManPage tool for reading man pages, and a ManSearch tool for apropos. It adds a terminal UI built on termbox2, tool confirmation for sensitive operations like reading arbitrary files, and a DeepSeek backend. The binary is built from an mruby build configuration and distributed as a single executable. See the robert repository and website for more.

Conclusion

The same approach described here can be applied to other things like internal documentation, log files, etc. Further topics worth exploring include LLM::Context for manual tool loops, LLM::Skill for packaging reusable instructions, and LLM::MCP for connecting to remote tool servers.