Performance gains with persistent connections

About

llm.rb is a zero-dependency Ruby toolkit for interacting with Large Language Models that tries to be as robust and useful as libraries with feature-rich dependencies. Sometimes that can be challenging, and sometimes it can require compromise.

This post looks at one of those scenarios where we provide a powerful performance optimization as an opt-in feature that requires the user to install the net-http-persistent gem separately.

Background

By default, every request that is made by llm.rb sets up and tears down a socket. No attempt is made to reuse the socket, or keep the socket alive. This is not always a huge deal but in case it becomes a bottleneck llm.rb provides a persistent connection pool via net-http-persistent.

The net-http-persistent gem implements a connection pool where the gem maintains active connections that can be reused multiple times across multiple requests, and across multiple threads – and good news, it is fully thread-safe.

The gem should be installed manually:

gem install net-http-persistent

Reuse

The next step configures an instance of LLM::OpenAI to use a connection pool where each request in the example reuses the same socket that was initially created for the first request. This approach can significantly improve performance since the cost (DNS lookup, TCP / SSL handshakes) of tearing down and setting up a connection is limited to a single socket.

The optimization applies to all requests for a single provider - that means a socket can be reused across API endpoints, and even across multiple instances of LLM::OpenAI that exist in different threads (eg a Sidekiq environment). The optimization is applied automatically as long as an object opts in via the persistent option:

      #!/usr/bin/env ruby
require "llm"
llm = LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
res1 = llm.responses.create "message 1"
res2 = llm.responses.create "message 2", previous_response_id: res1.response_id
res3 = llm.responses.create "message 3", previous_response_id: res2.response_id

    

Conclusion

Utilizing persistent HTTP connections in llm.rb is a simple way to boost performance for applications that make repeated requests. By opting into this feature we can reduce connection overhead and improve throughput – especially in threaded or high-volume environments.