Class: LLM::Gemini::Audio

Inherits:
Object
  • Object
show all
Defined in:
lib/llm/providers/gemini/audio.rb

Overview

The LLM::Gemini::Audio class provides an audio object for interacting with Gemini’s audio API.

Examples:

#!/usr/bin/env ruby
require "llm"

llm = LLM.gemini(ENV["KEY"])
res = llm.audio.create_transcription(input: LLM::File("/rocket.mp3"))
res.text # => "A dog on a rocket to the moon"

Instance Method Summary collapse

Constructor Details

#initialize(provider) ⇒ LLM::Gemini::Responses

Returns a new Audio object

Parameters:



19
20
21
# File 'lib/llm/providers/gemini/audio.rb', line 19

def initialize(provider)
  @provider = provider
end

Instance Method Details

#create_speechObject

Raises:

  • (NotImplementedError)

    This method is not implemented by Gemini



26
27
28
# File 'lib/llm/providers/gemini/audio.rb', line 26

def create_speech
  raise NotImplementedError
end

#create_transcription(file:, model: "gemini-1.5-flash", **params) ⇒ LLM::Response::AudioTranscription

Create an audio transcription

Examples:

llm = LLM.gemini(ENV["KEY"])
res = llm.audio.create_transcription(file: LLM::File("/rocket.mp3"))
res.text # => "A dog on a rocket to the moon"

Parameters:

  • file (LLM::File, LLM::Response::File)

    The input audio

  • model (String) (defaults to: "gemini-1.5-flash")

    The model to use

  • params (Hash)

    Other parameters (see Gemini docs)

Returns:

Raises:

See Also:



42
43
44
45
46
47
48
49
50
51
# File 'lib/llm/providers/gemini/audio.rb', line 42

def create_transcription(file:, model: "gemini-1.5-flash", **params)
  res = @provider.complete [
    "Your task is to transcribe the contents of an audio file",
    "Your response should include the transcription, and nothing else",
    file
  ], :user, model:, **params
  LLM::Response::AudioTranscription
    .new(res)
    .tap { _1.text = res.choices[0].content }
end

#create_translation(file:, model: "gemini-1.5-flash", **params) ⇒ LLM::Response::AudioTranslation

Create an audio translation (in English)

Examples:

# Arabic => English
llm = LLM.gemini(ENV["KEY"])
res = llm.audio.create_translation(file: LLM::File("/bismillah.mp3"))
res.text # => "In the name of Allah, the Beneficent, the Merciful."

Parameters:

  • file (LLM::File, LLM::Response::File)

    The input audio

  • model (String) (defaults to: "gemini-1.5-flash")

    The model to use

  • params (Hash)

    Other parameters (see Gemini docs)

Returns:

Raises:

See Also:



66
67
68
69
70
71
72
73
74
75
# File 'lib/llm/providers/gemini/audio.rb', line 66

def create_translation(file:, model: "gemini-1.5-flash", **params)
  res = @provider.complete [
    "Your task is to translate the contents of an audio file into English",
    "Your response should include the translation, and nothing else",
    file
  ], :user, model:, **params
  LLM::Response::AudioTranslation
    .new(res)
    .tap { _1.text = res.choices[0].content }
end