About
llm.rb is a zero-dependency Ruby toolkit for interacting with Large Language Models that tries to be as robust and useful as libraries with feature-rich dependencies. Sometimes that can be challenging, and sometimes it can require compromise.
This post looks at one of those scenarios where we provide a powerful performance optimization as an opt-in feature that requires the user to install the net-http-persistent gem separately.
Background
By default, every request that is made by llm.rb sets up and tears down a socket. No attempt is made to reuse the socket, or keep the socket alive. This is not always a huge deal but in case it becomes a bottleneck llm.rb provides a persistent connection pool via net-http-persistent.
The net-http-persistent gem implements a connection pool where the gem maintains active connections that can be reused multiple times across multiple requests, and across multiple threads – and good news, it is fully thread-safe.
The gem should be installed manually:
gem install net-http-persistent
Reuse
The next step configures an instance of LLM::OpenAI
to use a connection pool where each
request in the example reuses the same socket that was initially created for
the first request. This approach can significantly improve performance since
the cost (DNS lookup, TCP / SSL handshakes) of tearing down and setting up a
connection is limited to a single socket.
The optimization applies to all requests for a
single provider - that means a socket can be reused across
API endpoints, and even across multiple instances of LLM::OpenAI
that exist in different threads (eg a
Sidekiq environment). The optimization is applied automatically as long as an
object opts in via the persistent
option:
#!/usr/bin/env ruby
require "llm"
llm = LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
res1 = llm.responses.create "message 1"
res2 = llm.responses.create "message 2", previous_response_id: res1.response_id
res3 = llm.responses.create "message 3", previous_response_id: res2.response_id
Conclusion
Utilizing persistent HTTP connections in llm.rb is a simple way to boost performance for applications that make repeated requests. By opting into this feature we can reduce connection overhead and improve throughput – especially in threaded or high-volume environments.