How to build a man page agent

About

This post documents how to build an agent that is specialized in answering queries from man pages. It will show how any Sequel or ActiveRecord model can become an agent, and how to extend an agents capabilities with local tools. We will implement one agent, and two tools: a man page reader, and a man page searcher (an interface to apropos). The agent will then be equipped to answer queries and then use our tools to find the answer.

This post is made possible by llm.rb - the most capable ruby AI runtime that exists today.

Migration

The first step is a boring one but neccessary for this example, we have to define the table where an agent can be stored. The agent is serialized into the data column, this can be changed to jsonb for better performance where supported, but for simplicity we'll use a text column:

create_table :agents do
  primary_key :id
  String :provider, null: false
  String :model
  String :data, text: true
  Integer :input_tokens
  Integer :output_tokens
  Integer :total_tokens
end

Explanation

:provider, :model, and :data
These are the main columns used by llm.rb's Sequel persistence layer.
:input_tokens, :output_tokens, and :total_tokens
Keeps token usage on the record.

Agent

Our agent will be equipped with two tools, instructions, a model, and a concurrency model for tool execution. This example will execute each tool in a separate thread but it could also be executed sequentially, with fibers, with async-task, or with ractors (experimental):

require "llm"
require "net/http/persistent"
require "sequel"
require "sequel/plugins/llm"

class Agent < Sequel::Model
  plugin :agent, provider: :set_provider

  model "gpt-5.4-mini"
  instructions "Answer questions by searching and reading local UNIX man pages."
  tools Apropos, Man
  concurrency :thread

  private

  def set_provider
    {key: ENV["#{provider.upcase}_SECRET"], persistent: true}
  end
end

Explanation

plugin :agent, provider: :set_provider
Persists LLM::Agent state on the model.
tools Apropos, Man
These are the two tools the model can call.
concurrency :thread
Lets the agent run tool work with threads.

Tools

Apropos

require "open3"

class Apropos < LLM::Tool
  name "apropos"
  description "Search the local man page database for commands related to a topic"
  param :query, String, "Search terms for apropos", required: true

  def call(query:)
    output, = Open3.capture2("apropos", query)
    matches = output.lines.map(&:chomp).reject(&:empty?).first(10)
    {query:, matches:}
  end
end

Explanation

param :query, String, ...
Defines the input for the tool.
Open3.capture2("apropos", query)
Runs apropos without going through a shell.
first(10)
Returns the first few matches.

Man

require "open3"

class Man < LLM::Tool
  name "man"
  description "Read a local man page and return its plain text contents"
  param :page, String, "The man page to read, such as ls or printf", required: true
  param :section, String, "Optional man page section, such as 1 or 5", optional: true

  def call(page:, section: nil)
    env = {"MANPAGER" => "cat", "PAGER" => "cat", "MANWIDTH" => "80"}
    output, = Open3.capture2(env, "man", *[section, page].compact)
    text = output.gsub(/\x08./, "").strip
    {page:, section:, content: text[0, 12_000]}
  end
end

Explanation

param :page and param :section
Lets the model ask for a page directly, or narrow it to a section.
env = {"MANPAGER" => "cat", ...}
Forces plain text output instead of opening a pager.
*[section, page].compact
Skips section when it is not given.
text[0, 12_000]
Returns a bounded slice of the page.

Conclusion

With everything in place we can rejoice in our hard work and reap the benefits. About this example: these questions might seem ordinary, but the important piece to realize is that the answer is sourced from our local tools, and not from the model's training data.

The other important piece to recognize is that the agent in this case is a source of knowledge, but it could also be a source of action. Tools can do anything your imagination can come up with, and with llm.rb you can focus on building cool stuff and let the runtime handle the complexity of LLM interactions, tool execution, and agent state management.

Last but not least, I used Sequel in this post because I support ecosystem diversity but llm.rb also has builtin support for ActiveRecord and the equivalent can be achieved with almost the same code:

##
# Create our agent
agent = Agent.create(provider: "openai", model: "gpt-5.4-mini")

##
# First question - persists automatically
puts agent.talk("How do I extract a tar archive?").content

##
# Second question - persists automatically
puts agent.talk("What about gzipped tar archives?").content

##
# Third question - persists automatically
puts agent.talk("How do I block incoming traffic with pf.conf?").content