Changelog

Unreleased

Change

Allow agent instance tracer overrides
Let LLM::Agent.new(..., tracer: ...) override the class-level tracer for that agent instance.
Make :fiber use scheduler-backed fibers
Change :fiber tool execution to use Fiber.schedule and require Fiber.scheduler, instead of wrapping direct calls in raw fibers. This gives :fiber a real cooperative concurrency model instead of acting as a thin wrapper around sequential execution.

v7.0.0

Changes since v6.1.0.

This release turns agent tool-loop limit errors into in-band advisory returns so the LLM can react to rate limits and continue the loop. It adds tool_attempts: nil as a way to opt out of advisory tool-limit returns entirely, and fixes the default provider HTTP path to keep net-http-persistent optional when not explicitly enabled.

Breaking

Return in-band tool-loop limit errors from agents
Stop raising LLM::ToolLoopError when an agent exhausts its tool loop attempt budget, and instead send advisory LLM::Function::Return errors back through the model so the LLM can react to the rate limit in-band and continue the loop.
Allow tool_attempts: nil to disable advisory tool-limit returns
Keep the default tool_attempts budget at 25, but treat an explicit tool_attempts: nil as an opt-out that disables advisory tool-limit returns entirely.

Fix

Keep net-http-persistent optional on normal HTTP requests
Stop the default provider HTTP path from loading net/http/persistent unless persistent transport support is explicitly enabled.

v6.1.0

Changes since v6.0.0.

This release tightens interrupt and compaction behavior for long-running contexts. It adds LLM::Buffer#rindex, supports percentage-based token thresholds in LLM::Compactor, tracks persisted compaction state through context serialization, reliably interrupts Async-backed requests, preserves valid tool-call history on cancellation, keeps concurrent skill tool loops running on streamed agents, and returns zero-valued usage objects when no provider usage has been recorded yet.

Change

Add LLM::Buffer#rindex
Add LLM::Buffer#rindex as a direct forward to the underlying message array so callers can find the last matching message index through the buffer API.
Support percentage compaction token thresholds
Let LLM::Compactor accept token_threshold: values like "90%" so compaction can trigger at a percentage of the active model context window.

Fix

Interrupt Async-backed requests reliably
Track request ownership through the provider transport so contexts use the active Async task when available, letting ctx.interrupt! reliably cancel streamed requests under Async runtimes and surface them as LLM::Interrupt.
Preserve valid tool-call history on cancellation
Append cancelled tool-return messages for unresolved tool calls during ctx.interrupt! so follow-up provider requests do not fail with invalid tool-call history after pending tool work is cancelled.
Preserve concurrent skill tool loops on streamed agents
Propagate the active agent concurrency through the effective request stream so nested skill agents keep using queued wait(...) tool execution instead of falling back to direct :call execution.
Track persisted compaction state on contexts
Mark contexts as compacted after LLM::Compactor#compact!, persist and restore that state through context serialization, and clear it after the next successful model response.
Return zero-valued usage objects from contexts
Make LLM::Context#usage consistently return an LLM::Object, using a zero-valued usage object when no provider usage has been recorded yet.

v6.0.0

Changes since v5.4.0.

This release simplifies the ORM persistence contract around serialized data state, removing the assumption of reserved provider, model, and usage columns. Provider selection must now come from provider: hooks, model defaults come from context: or agent DSL, and usage is read from the serialized runtime state. Alongside this breaking change, Sequel JSON and JSONB persistence is fixed, ractor-backed tools now fire tracer callbacks, and LLM::RactorError is raised for unsupported ractor tool work.

Change

Simplify ORM persistence to serialized data state
Change the built-in ActiveRecord and Sequel wrappers to treat serialized data as the persistence contract, instead of assuming reserved provider, model, and usage columns. Provider selection must now come from provider: hooks that resolve a real LLM::Provider instance, model defaults come from context: or agent DSL, and usage is read from the serialized runtime state.

Fix

Fix Sequel JSON and JSONB persistence
Load Sequel PostgreSQL JSON support when plugin :llm is configured with format: :json or :jsonb, and wrap structured payloads correctly so persisted context state can be stored in PostgreSQL JSON columns.
Trace ractor-backed tool callbacks
Make tool tracers fire on_tool_start and on_tool_finish for class-based :ractor execution too, so ractor-backed tool calls show up in tracer callbacks like the other concurrent tool paths.
Raise LLM::RactorError for unsupported ractor tool work
Add LLM::RactorError and fail fast when :ractor execution is requested for unsupported tool types such as skill-backed tools, instead of letting deeper Ruby isolation errors leak out later in execution.

v5.4.0

Changes since v5.3.0.

This release expands tracer support around agentic execution. It lets LLM::Agent define scoped tracers through the agent DSL and fixes concurrent tool execution so those scoped tracers stay attached when work crosses thread, task, fiber, and skill boundaries.

Change

Add agent-scoped tracers
Let LLM::Agent classes define tracer ... or tracer { ... } so an agent can carry its own tracer without replacing the provider's default tracer. The resolved tracer is scoped to that agent's turns, tool loops, and pending tool access. Available through the acts_as_agent and Sequel agent plugin tracer DSL too.

Fix

Preserve scoped tracers across concurrent tool work
Keep agent- and request-scoped tracers attached when tool execution crosses :thread, :task, or :fiber boundaries, including skill execution, so spawned work does not fall back to the provider default tracer.

v5.3.0

Changes since v5.2.1.

This release deepens llm.rb's request-rewriting and tool-definition surface. It adds transformer lifecycle hooks to LLM::Stream so UIs can surface work like PII scrubbing before a request is sent, and it adds a more explicit OmniAI-style tool DSL form with parameter plus separate required declarations while keeping the older param ... required: true style working.

Change

Add transformer stream lifecycle hooks
Add on_transform and on_transform_finish to LLM::Stream so UIs can surface request rewriting work such as PII scrubbing before a request is sent to the model.
Add a separate required tool DSL form
Add parameter as an alias of param and support required %i[...] as a separate declaration, inspired by OmniAI-style tools, while keeping the existing param ... required: true form working too.

v5.2.1

Changes since v5.2.0.

This release tightens the streamed queue fix from v5.2.0 for concurrent workloads. Request-local streams now stay bound long enough for wait to drain queued work and then clear cleanly so later waits fall back to the context's configured stream.

Fix

Reset request-local streams after wait drains queued work
Keep per-call stream: bindings alive through LLM::Context#wait so queued streamed tool work still resolves correctly, then clear the request-local stream after the wait completes to avoid leaking it into later turns.

v5.2.0

Changes since v5.1.0.

This release adds current DeepSeek V4 support through refreshed provider metadata, including deepseek-v4-flash and deepseek-v4-pro, while fixing request-local queue handling for concurrent streamed workloads so wait and interruption use the active per-call stream correctly.

Change

Add LLM::MCP#run for scoped MCP client lifecycle
Add LLM::MCP#run so MCP clients can be started for the duration of a block and then stopped automatically, which simplifies the usual start/stop pattern in examples and application code.
Refresh provider model metadata
Add current DeepSeek and OpenAI model metadata to data/ and update the Google Gemma model entry to match the current provider naming.

Fix

Reject unsupported DeepSeek multimodal prompt objects early
Raise LLM::PromptError for image_url, local_file, and remote_file in DeepSeek chat requests instead of sending invalid OpenAI-compatible payloads that the provider rejects at runtime.
Preserve DeepSeek reasoning content across tool turns
Replay reasoning_content when serializing prior assistant messages for DeepSeek chat completions, so thinking-mode tool calls can continue into follow-up requests without triggering invalid request errors.
Default DeepSeek to deepseek-v4-flash
Change LLM::DeepSeek#default_model to deepseek-v4-flash so new contexts and default provider usage align with the current preferred chat model.
Use per-call streams when waiting on streamed tool work
Track request-local streams bound through talk(..., stream:) and respond(..., stream:) so LLM::Context#wait and interruption-aware queue handling use the active stream instead of falling back to pending function spawning.

v5.1.0

Changes since v5.0.0.

This release tightens streamed tool execution around the actual request-local runtime state. It fixes streamed resolution of per-request tools and makes that streamed path work cleanly with LLM.function(...), MCP tools, bound tool instances, and normal tool classes.

Fix

Resolve request-local tools during streaming
Resolve streamed tool calls through LLM::Stream request-local tools before falling back to the global registry, so per-request tools and bound tool instances work correctly during streaming.
Support LLM.function(...) and MCP tools in streamed tool resolution
Let streamed tool resolution use the current request tool set, so LLM.function(...), MCP tools, bound tool instances, and normal LLM::Tool classes all work through the same streamed tool path.

v5.0.0

Changes since v4.23.0.

This release expands llm.rb from an execution runtime into a more explicit supervision and transformation runtime. It adds context-level guards, transformers, and loop supervision through LLM::LoopGuard, while deepening long-lived context behavior through compaction, interruption hooks, and streamed ctx.spawn(...) tool execution.

Change

Make compactor thresholds explicit
Require message_threshold: and token_threshold: to be opted into explicitly, so LLM::Compactor only compacts automatically when one of those thresholds is configured. Context-window-derived token limits can be computed by the caller when needed.
Allow assigning a compactor through LLM::Context
Let LLM::Context accept ctx.compactor = ... in addition to the constructor compactor: option, so compactor config can be assigned or replaced after context initialization.
Mark compaction summaries in message metadata
Mark compaction summaries with extra[:compaction] and LLM::Message#compaction?, so applications can detect or hide synthetic summary messages in conversation history.
Add cooperative tool interruption hooks
Let ctx.interrupt! notify queued tool work through on_interrupt, so running tools can clean up cooperatively when a context is cancelled.
Add LLM::Context guards
Add a new guard capability to LLM::Context so execution can be supervised at the runtime level. The built-in LLM::LoopGuard detects repeated tool-call patterns and stops stuck agentic loops through in-band LLM::GuardError returns. LLM::Agent enables this guard by default.
Add LLM::Context transformers
Add a new transformer capability to LLM::Context so prompts and params can be rewritten before provider requests are sent. This makes it possible to apply context-wide behaviors such as PII scrubbing or request-level param injection without rewriting every talk and respond call site.

v4.23.0

Changes since v4.22.0.

This release expands llm.rb's runtime surface for long-lived contexts and stateful tools. It adds built-in context compaction through LLM::Compactor, lets explicit tools: arrays accept bound LLM::Tool instances, and fixes OpenAI-compatible no-arg tool schemas for stricter providers such as xAI.

Change

Add LLM::Compactor for long-lived contexts
Add built-in context compaction through LLM::Compactor, so older history can be summarized, retained windows can stay bounded, compaction can run on its own model:, thresholds can be configured explicitly, and LLM::Stream can observe the lifecycle through on_compaction and on_compaction_finish.
Allow bound tool instances in explicit tool lists
Let explicit tools: arrays accept LLM::Tool instances such as MyTool.new(foo: 1), so tools can carry bound state without changing the global tool registry model.

Fix

Fix xAI/OpenAI-compatible no-arg tool schemas
Send an empty object schema for tools without declared parameters instead of null, so stricter providers such as xAI accept mixed tool sets that include no-arg tools.

v4.22.0

Changes since v4.21.0.

This release deepens the runtime shape of llm.rb. It reduces helper-method surface on persisted ORM models, expands real ORM coverage, and makes skills behave more like bounded sub-agents with inherited recent context and proper instruction injection.

Change

Reduce ActiveRecord wrapper model surface
Move helper methods such as option resolution, column mapping, serialization, and persistence into Utils for the ActiveRecord wrappers so wrapped models include fewer internal helper methods.
Reduce Sequel wrapper model surface
Move helper methods such as option resolution, column mapping, serialization, and persistence into Utils for the Sequel wrappers so wrapped models include fewer internal helper methods.
Expand ORM integration coverage
Add broader ActiveRecord and Sequel coverage for persisted context and agent wrappers, including real SQLite-backed records and cassette-backed OpenAI persistence paths.
Make skills inherit recent parent context
Run LLM::Skill with a curated slice of recent parent user and assistant messages, prefixed with Recent context:, so skills behave more like task-scoped sub-agents instead of instruction-only helpers.

Fix

Fix Sequel plugin :agent load order
Require the shared Sequel plugin support from LLM::Sequel::Agent so plugin :agent can load independently without raising uninitialized constant LLM::Sequel::Plugin.
Make skill execution inherit parent context request settings
Run LLM::Skill through a parent LLM::Context instead of a bare provider so nested skill agents inherit context-level settings such as mode: :responses, store: false, streaming, and other request defaults, while still keeping skill-local tools and avoiding parent schemas.
Keep agent instructions when history is preseeded
Inject LLM::Agent instructions once unless a system message is already present, so agents and nested skills still get their instructions when they start with inherited non-system context.

v4.21.0

Changes since v4.20.2.

This release expands higher-level composition in llm.rb. It adds Sequel agent persistence through plugin :agent and introduces directory-backed skills that load from SKILL.md, resolve named tools, and plug directly into LLM::Context and LLM::Agent.

Change

Add plugin :agent for Sequel models
Add Sequel support for plugin :agent, similar to ActiveRecord's acts_as_agent, so models can wrap LLM::Agent with built-in persistence.
Load directory-backed skills through LLM::Context and LLM::Agent
Add skills: to LLM::Context and skills ... to LLM::Agent so directories with SKILL.md can be loaded, resolved into tools, and run through the normal llm.rb tool path.

v4.20.2

Changes since v4.20.1.

This patch release improves runtime behavior around interruption and mixed concurrency waits. It also rounds out response API uniformity for Google completion responses.

Fix

Expose Google completion response IDs through .id
Add LLM::Response#id support to Google completion responses so tracer and caller code can rely on the same API used by other providers.
Track interrupt ownership on the active request
Bind LLM::Context interruption to the fiber running talk or respond so interrupt! works correctly when requests are started outside the context's initialization fiber.

Change

Allow mixed concurrency strategies in wait(...)
Let LLM::Context#wait, LLM::Stream#wait, and LLM::Agent.concurrency accept arrays such as [:thread, :ractor] so mixed tool sets can wait on more than one concurrency strategy.

v4.20.1

Changes since v4.20.0.

This patch release fixes ORM option resolution in the Sequel and ActiveRecord wrappers. Symbol-based provider: and context: hooks now resolve correctly, and internal default option constants are referenced explicitly instead of relying on nested constant lookup.

Fix

Fix symbol-based ORM option hooks for provider and context hashes
Make provider: and context: resolve symbol hooks through the model in the Sequel plugin and ActiveRecord wrappers instead of falling back to an empty hash.
Fix ORM wrapper constant lookup for option defaults
Qualify internal EMPTY_HASH / DEFAULTS references in the Sequel plugin and ActiveRecord wrappers so option resolution does not depend on nested constant lookup quirks.

v4.20.0

Changes since v4.19.0.

This release adds better support for tagged prompt content. LLM::Context can now serialize and restore image_url, local_file, and remote_file content cleanly, and LLM::Message now exposes helpers for inspecting tagged image and file attachments.

Change

Round-trip tagged prompt objects through LLM::Context
Teach LLM::Context serialization and restore to preserve image_url, local_file, and remote_file content across to_json / restore.
Add attachment helpers to LLM::Message
Add image_url?, image_urls, file?, and files so callers can inspect messages for tagged image and file content more directly.

v4.19.0

Changes since v4.18.0.

This release tightens the ActiveRecord and ORM integration layer. It adds inline agent DSL blocks to acts_as_agent so agent defaults can be defined where the wrapper is declared, and it exposes the resolved provider through public llm methods on the ActiveRecord and Sequel wrappers.

Change

Make ORM provider access public through llm
Expose the resolved provider on the Sequel plugin and the ActiveRecord acts_as_llm / acts_as_agent wrappers through a public llm method.
Allow inline agent DSL blocks in acts_as_agent
Let ActiveRecord models configure model, tools, schema, instructions, and concurrency directly inside the acts_as_agent declaration block.

v4.18.0

Changes since v4.17.0.

This release improves tracing and tool execution behavior across llm.rb. It makes provider tracers default to the provider instance, adds LLM::Provider#with_tracer for scoped overrides, restores tool tracing for concurrent and streamed tool execution, extends streamed tracing to MCP tools, and adds symbol-based ORM option hooks alongside experimental ractor tool concurrency.

Change

Make provider tracers default to the provider instance
Change llm.tracer = ... so it sets a provider default tracer instead of relying on scoped fiber-local state alone. This makes tracer configuration behave more predictably across normal tasks, threads, and fibers that share the same provider instance.
Add LLM::Provider#with_tracer for scoped overrides
Add with_tracer as the opt-in escape hatch for request- or turn-scoped tracer overrides. Use it when you want temporary tracing on the current fiber without replacing the provider's default tracer.
Trace concurrent tool calls outside ractors
Make tool tracing fire correctly when functions run through :thread, :task, or :fiber concurrency. Experimental :ractor execution still does not emit tool tracer events.
Trace streamed tool calls, including MCP tools
Bind stream metadata through LLM::Stream#extra so streamed tool calls inherit tracer and model context before they are handed to on_tool_call. This restores tool tracing for streamed MCP and local tool execution.
Support symbol-based ORM option hooks
Let provider:, context:, and tracer: on the Sequel plugin and the ActiveRecord acts_as_llm / acts_as_agent wrappers resolve through model method names as well as procs.
Add experimental ractor tool concurrency
Add :ractor support to LLM::Function#spawn, LLM::Function::Array#wait, LLM::Stream#wait, and LLM::Agent.concurrency so class-based tools with ractor-safe arguments and return values can run in Ruby ractors and report their results back into the normal LLM tool-return path. MCP tools are not supported by the current :ractor mode, but mixed workloads can still branch on tool.mcp? and choose a supported strategy per tool. :ractor is especially useful for CPU-bound tools, while :task, :fiber, or :thread may be a better fit for I/O-bound work.

v4.17.0

Changes since v4.16.1.

This release expands agent support across llm.rb. It brings LLM::Agent closer to LLM::Context, adds configurable automatic tool concurrency including experimental ractor support for class-based tools, extends persisted ORM wrappers with more of the context runtime surface and tracer hooks, and introduces built-in ActiveRecord agent persistence through acts_as_agent.

Change

Add configurable tool concurrency to LLM::Agent
Add the class-level concurrency DSL to LLM::Agent so automatic tool loops can run with :call, :thread, :task, :fiber, or experimental :ractor support for class-based tools instead of always executing sequentially.
Bring LLM::Agent closer to LLM::Context
Expand LLM::Agent so it exposes more of the same runtime surface as LLM::Context, including returns, interruption, mode, cost, context window, structured serialization, and other context-backed helpers, while still auto-managing tool loops.
Refresh agent docs and coverage
Update the README and deep dive to explain the current role of LLM::Agent, add examples that show automatic tool execution and concurrency, and add focused specs for the expanded agent surface and tool-loop behavior.
Add ORM tracer hooks for persisted contexts
Add tracer: to both the Sequel plugin and acts_as_llm so models can resolve and assign tracers onto the provider used by their persisted LLM::Context.
Bring persisted ORM wrappers closer to LLM::Context
Expand both the Sequel plugin and acts_as_llm so record-backed contexts expose more of the same runtime surface as LLM::Context, including mode, returns, interruption, prompt helpers, file helpers, and tracer access.
Add ActiveRecord agent persistence with acts_as_agent
Add acts_as_agent for ActiveRecord models that should wrap LLM::Agent, reusing the same record-backed runtime shape as acts_as_llm while letting tool execution be managed by the agent.

v4.16.1

Changes since v4.16.0.

This release tightens ORM persistence by removing an unnecessary JSON round-trip when restoring structured :json and :jsonb context payloads.

Change

Restore structured ORM payloads directly
Teach LLM::Context#restore to accept parsed data payloads and use that path from the ActiveRecord and Sequel persistence wrappers for format: :json and :jsonb, avoiding a redundant Hash -> JSON string -> Hash round-trip on restore.

v4.16.0

Changes since v4.15.0.

This release expands ORM support with built-in ActiveRecord persistence and improves compatibility with OpenAI-compatible gateways, proxies, and self-hosted servers that use non-standard API root paths.

Change

Support OpenAI-compatible base paths
Add base_path: to provider configuration so OpenAI-compatible endpoints can vary both host and API prefix. This supports providers, proxies, and gateways that keep OpenAI request shapes but use non-standard URL layouts such as DeepInfra's /v1/openai/....
Add ActiveRecord context persistence with acts_as_llm
Add a built-in ActiveRecord wrapper that mirrors the Sequel plugin API so applications can persist LLM::Context state on records with default columns, provider/context hooks, validation-backed writes, and format: :string, :json, or :jsonb storage.

v4.15.0

Changes since v4.14.0.

Change

Reduce OpenAI stream parser merge overhead
Special-case the most common single-field deltas, streamline incremental tool-call merging, and avoid repeated JSON parse attempts until streamed tool arguments look complete.
Cache streaming callback capabilities in parsers
Cache callback support checks once at parser initialization time in the OpenAI, OpenAI Responses, Anthropic, Google, and Ollama stream parsers instead of repeating respond_to? checks on hot streaming paths.
Reduce OpenAI Responses parser lookup overhead
Special-case the hot Responses API event paths and cache the current output item and content part so streamed output text deltas do less repeated nested lookup work.
Add a Sequel context persistence plugin
Add plugin :llm for Sequel models so apps can persist LLM::Context state with default columns and pass provider setup through provider: when needed. The plugin now also supports format: :string, :json, or :jsonb for text and native JSON storage when Sequel JSON typecasting is enabled.
Improve streaming parser performance
In the local replay-based stream_parser benchmark versus v4.14.0 (median of 20 samples, 5000 iterations), plain Ruby is a small overall win: the generic eventstream path is about 0.4% faster, the OpenAI stream parser is about 0.5% faster, and the OpenAI Responses parser is about 1.6% faster, with unchanged allocations. Under YJIT on the same benchmark, the generic eventstream path is about 0.9% faster and the OpenAI stream parser is about 0.4% faster, while the OpenAI Responses parser is about 0.7% slower, also with unchanged allocations.

Compared to v4.13.0, the larger v4.14.0 streaming gains still hold. The generic eventstream path remains dramatically faster than v4.13.0, the OpenAI stream parser remains modestly faster, and the OpenAI Responses parser is roughly flat to slightly better depending on runtime. In other words, current keeps the large eventstream win from v4.14.0, adds only small incremental changes beyond that, and does not turn the post-v4.14.0 parser work into another large benchmark jump.

v4.14.0

Changes since v4.13.0.

This release adds request interruption for contexts, reworks provider HTTP internals for lower-overhead streaming, and fixes MCP clients so parallel tool calls can safely share one connection.

Add

Add request interruption support
Add LLM::Context#interrupt!, LLM::Context#cancel!, and LLM::Interrupt for interrupting in-flight provider requests, inspired by Go's context cancellation.

Change

Rework provider HTTP transport internals
Rework provider HTTP around LLM::Provider::Transport::HTTP with explicit transient and persistent transport handling.
Reduce SSE parser overhead
Dispatch raw parsed values to registered visitors instead of building an Event object for every streamed line.
Reduce provider streaming allocations
Decode streamed provider payloads directly in LLM::Provider::Transport::HTTP before handing them to provider parsers, which cuts allocation churn and gives a smaller streaming speed bump.
Reduce generic SSE parser allocations
Keep unread event-stream buffer data in place until compaction is worthwhile, which lowers allocation churn in the remaining generic SSE path.
Improve streaming parser performance
In the local replay-based stream_parser benchmark versus v4.13.0 (median of 20 samples, 5000 iterations): Plain Ruby: the generic eventstream path is about 53% faster with about 32% fewer allocations, the OpenAI stream parser is about 11% faster with about 4% fewer allocations, and the OpenAI Responses parser is about 3% faster with unchanged allocations. YJIT on the current parser benchmark harness: the current tree is about 26% faster than non-YJIT on the generic eventstream path, about 18% faster on the OpenAI stream parser, and about 16% faster on the OpenAI Responses parser, with allocations unchanged.

Fix

Support parallel MCP tool calls on one client
Route MCP responses by JSON-RPC id so concurrent tool calls can share one client and transport without mismatching replies.
Use explicit MCP non-blocking read errors
Use IO::EAGAINWaitReadable while continuing to retry on IO::WaitReadable.

v4.13.0

Changes since v4.12.0.

This release expands MCP prompt support, improves reasoning support in the OpenAI Responses API, and refreshes the docs around llm.rb's runtime model, contexts, and advanced workflows.

Add

Add LLM::MCP#prompts and LLM::MCP#find_prompt for MCP prompt support.

Change

Rework the README around llm.rb as a runtime for AI systems.
Add a dedicated deep dive guide for providers, contexts, persistence, tools, agents, MCP, tracing, multimodal prompts, and retrieval.

Fix

All of these fixes apply to MCP:

fix(mcp): raise LLM::MCP::MismatchError on mismatched response ids.
fix(mcp): normalize prompt message content while preserving the original payload.

All of these fixes apply to OpenAI's Responses API:

fix(openai): emit on_reasoning_content for streamed reasoning summaries.
fix(openai): skip previous_response_id on store: false follow-up calls.
fix(openai): fall back to an empty object schema for tools without params.
fix(openai): preserve original tool-call payloads on re-sent assistant tool messages.
fix(openai): emit output_text for assistant-authored response content.
fix(openai): return nil for system_fingerprint on normalized response objects.

v4.12.0

Changes since v4.11.1.

This release expands advanced streaming and MCP execution while reframing llm.rb more clearly as a system integration layer for LLMs, tools, MCP sources, and application APIs.

Add

Add persistent as an alias for persist! on providers and MCP transports.
Add LLM::Stream#on_tool_return for observing completed streamed tool work.
Add LLM::Function::Return#error?.

Change

Expect advanced streaming callbacks to use LLM::Stream subclasses instead of duck-typing them onto arbitrary objects. Basic #<< streaming remains supported.

Fix

Fix Anthropic tools without params by always emitting input_schema.
Fix Anthropic tool-only responses to still produce an assistant message.
Fix Anthropic tool results to use the user role.
Fix Anthropic tool input normalization.

v4.11.1

Changes since v4.11.0.

Fix

Cast OpenTelemetry tool-related values to strings.
Otherwise they're rejected by opentelemetry-sdk as invalid attributes.

v4.11.0

Changes since v4.10.0.

Add

Add LLM::Stream for richer streaming callbacks, including on_content, on_reasoning_content, and on_tool_call for concurrent tool execution.
Add LLM::Stream#wait as a shortcut for queue.wait.
Add LLM::Context#wait as a shortcut for the configured stream's wait.
Add LLM::Context#call(:functions) as a shortcut for functions.call.
Add LLM::Function.registry and enhanced support for MCP tools in LLM::Tool.registry for tool resolution during streaming.
Add normalized LLM::Response for OpenAI Responses, providing content, content!, messages / choices, usage, and reasoning_content.
Add mode: :responses to LLM::Context for routing talk through the Responses API.
Add LLM::Context#returns for collecting pending tool returns from the context.
Add persistent HTTP connection pooling for repeated MCP tool calls via LLM.mcp(http: ...).persist!.
Add explicit MCP transport constructors via LLM::MCP.stdio(...) and LLM::MCP.http(...).

Fix

Fix Google tool-call handling by synthesizing stable ids when Gemini does not provide a direct tool-call id.

v4.10.0

Changes since v4.9.0.

Add

Add HTTP transport for MCP with LLM::MCP::Transport::HTTP for remote servers
Add JSON Schema union types (any_of, all_of, one_of) with parser integration
Add JSON Schema type array union support (e.g., "type": ["object", "null"])
Add JSON Schema type inference from const, enum, or default fields

Change

Update LLM::MCP constructor for exclusive http: or stdio: transport
Update LLM::MCP documentation for HTTP transport support

v4.9.0

Changes since v4.8.0.

Add

Add fiber-based concurrency with LLM::Function::FiberGroup and LLM::Function::TaskGroup classes for lightweight async execution.
Add :thread, :task, and :fiber strategy parameter to LLM::Function#spawn for explicit concurrency control.
Add stdio MCP client support, including remote tool discovery and invocation through LLM.mcp, LLM::Context, and existing function/tool APIs.
Add model registry support via LLM::Registry, including model metadata lookup, pricing, modalities, limits, and cost estimation.
Add context access to a model context window via LLM::Context#context_window.
Add tracking of defined tools in the tool registry.
Add LLM::Schema::Enum, enabling Enum[...] as a schema/tool parameter type.
Add top-level Anthropic system instruction support using Anthropic's provider-specific request format.
Add richer tracing hooks and extra metadata support for LangSmith/OpenTelemetry-style traces.
Add rack/websocket and Relay-related example work, including MCP-focused examples.
Add concurrent tool execution with LLM::Function#spawn, LLM::Function::Array (call, wait, spawn), and LLM::Function::ThreadGroup.
Add LLM::Function::ThreadGroup#alive? method for non-blocking monitoring of concurrent tool execution.
Add LLM::Function::ThreadGroup#value alias for ThreadGroup#wait for consistency with Ruby's Thread#value.

Change

Rename LLM::Session to LLM::Context throughout the codebase to better reflect the concept of a stateful interaction environment.
Rename LLM::Gemini to LLM::Google to better reflect provider naming.
Standardize model objects across providers around a smaller common interface.
Switch registry cost internals from LLM::Estimate to LLM::Cost.
Update image generation defaults so OpenAI and xAI consistently return base64-encoded image data by default.
Update LLM::Bot deprecation warning from v5.0 to v6.0, giving users more time to migrate to LLM::Context.
Rework the README and screencast documentation to better cover MCP, registry, contexts, prompts, concurrency, providers, and example flow.
Expand the README with architecture, production, and provider guidance while improving readability and example ordering.

Fix

Fix local schema $ref resolution in LLM::Schema::Parser.
Fix multiple MCP issues around stdio env handling, request IDs, registry interaction, tool registration, and filtering of MCP tools from the standard tool registry.
Fix stream parsing issues, including chunk-splitting bugs and safer handling of streamed error responses.
Fix prompt handling across contexts, agents, and provider adapters so prompt turns remain consistent in history and completions.
Fix several tool/context issues, including function return wrapping, tool lookup after deserialization, unnamed subclass filtering, and thread-safety around tool registry mutations.
Fix Google tool-call handling to preserve thoughtSignature.
Fix LLM::Tracer::Logger argument handling.
Fix packaging/docs issues such as registry files in the gemspec and stale provider docs.
Fix Google provider handling of nil function IDs during context deserialization.
Fix MCP stdio transport by increasing poll timeout for better reliability.
Fix Google provider to properly cast non-Hash tool results into Hash format for API compatibility.
Fix schema parser to support recursive normalization of Array, LLM::Object, and nested structures.
Fix DeepSeek provider to tolerate malformed tool arguments.
Fix LLM::Function::TaskGroup#alive? to properly delegate to Async::Task#alive?.
Fix various RuboCop errors across the codebase.
Fix DeepSeek provider to handle JSON that might be valid but unexpected.

Notes

Notable merged work in this range includes:

feat(function): add fiber-based concurrency for async environments (#64)
feat(mcp): add stdio MCP support (#134)
Add LLM::Registry + cost support (#133)
Consistent model objects across providers (#131)
Add rack + websocket example (#130)
feat(gemspec): add changelog URI (#136)
feat(function): alias ThreadGroup#wait as ThreadGroup#value (#62)
README and screencast refresh across #66, #67, #68, #71, and #72
chore(bot): update deprecation warning from v5.0 to v6.0
fix(deepseek): tolerate malformed tool arguments
refactor(context): Rename Session as Context (#70)

Comparison base:

Latest tag: v4.8.0 (6468f2426ee125823b7ae43b4af507b125f96ffc)
HEAD used for this changelog: 915c48da6fda9bef1554ff613947a6ce26d382e3