Gemini Live API 可讓您與 Gemini 進行低延遲的雙向文字和語音互動。使用 Live API,您可以為使用者提供自然流暢的人類語音對話體驗,並且能夠使用文字或語音指令中斷模型的回覆。這個模型可以處理文字和音訊輸入內容 (影片即將推出),並提供文字和音訊輸出內容。
您可以使用提示和 Vertex AI Studio 中的 Live API 製作原型。
Live API 是具狀態的 API,可建立 WebSocket 連線,以便在用戶端和 Gemini 伺服器之間建立工作階段。詳情請參閱 Live API 參考說明文件。
事前準備
僅限使用 Vertex AI Gemini API 做為 API 供應商時使用。 |
如果您尚未完成,請參閱入門指南,瞭解如何設定 Firebase 專案、將應用程式連結至 Firebase、新增 SDK、初始化 Vertex AI Gemini API 的後端服務,以及建立 LiveModel
例項。
支援這項功能的型號
Live API 僅支援 gemini-2.0-flash-live-preview-04-09
(而非 gemini-2.0-flash
)。
使用 Live API 的標準功能
本節將說明如何使用 Live API 的標準功能,特別是串流各種類型的輸入和輸出內容:
從串流文字輸入內容產生串流文字
在嘗試這個範例前,請先完成本指南的「開始前」一節,設定專案和應用程式。 在該部分,您也需要點選所選Gemini API供應商的按鈕,才能在本頁面上看到供應商專屬內容。 |
您可以傳送串流文字輸入內容,並接收串流文字輸出內容。請務必建立 liveModel
例項,並將回應模式設為 Text
。
Swift
Live API 目前尚未支援 Apple 平台應用程式,但請稍後再回來看看!
Kotlin
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
val model = Firebase.vertexAI.liveModel(
modelName = "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.TEXT
}
)
val session = model.connect()
// Provide a text prompt
val text = "tell a short story"
session.send(text)
var outputText = ""
session.receive().collect {
if(it.status == Status.TURN_COMPLETE) {
// Optional: if you don't require to send more requests.
session.stopReceiving();
}
outputText = outputText + it.text
}
// Output received from the server.
println(outputText)
Java
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.vertexAI()).liveModel(
"gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
new LiveGenerationConfig.Builder()
.setResponseModalities(ResponseModality.TEXT)
.build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture = model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
@Override
public void onSubscribe(Subscription s) {
s.request(Long.MAX_VALUE); // Request an unlimited number of items
}
@Override
public void onNext(LiveContentResponse liveContentResponse) {
// Handle the response from the server.
System.out.println(liveContentResponse.getText());
}
@Override
public void onError(Throwable t) {
System.err.println("Error: " + t.getMessage());
}
@Override
public void onComplete() {
System.out.println("Done receiving messages!");
}
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
@Override
public void onSuccess(LiveSession ses) {
LiveSessionFutures session = LiveSessionFutures.from(ses);
// Provide a text prompt
String text = "tell me a short story?";
session.send(text);
Publisher<LiveContentResponse> publisher = session.receive();
publisher.subscribe(new LiveContentResponseSubscriber());
}
@Override
public void onFailure(Throwable t) {
// Handle exceptions
}
}, executor);
Web
Live API 目前尚未支援網頁應用程式,但請稍後再回來看看!
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
late LiveModelSession _session;
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel(
model: 'gemini-2.0-flash-live-preview-04-09',
// Configure the model to respond with text
config: LiveGenerationConfig(responseModalities: [ResponseModality.text]),
);
_session = await model.connect();
// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);
// In a separate thread, receive the response
await for (final message in _session.receive()) {
// Process the received message
}
Unity
using Firebase;
using Firebase.AI;
async Task SendTextReceiveText() {
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.VertexAI()).GetLiveModel(
modelName: "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Text })
);
LiveSession session = await model.ConnectAsync();
// Provide a text prompt
var prompt = ModelContent.Text("tell a short story");
await session.SendAsync(content: prompt, turnComplete: true);
// Receive the response
await foreach (var message in session.ReceiveAsync()) {
// Process the received message
if (!string.IsNullOrEmpty(message.Text)) {
UnityEngine.Debug.Log("Received message: " + message.Text);
}
}
}
瞭解如何選擇適合用途和應用程式的模型。
從串流音訊輸入內容產生串流音訊
在嘗試這個範例前,請先完成本指南的「開始前」一節,設定專案和應用程式。 在該部分,您也需要點選所選Gemini API供應商的按鈕,才能在本頁面上看到供應商專屬內容。 |
您可以傳送串流音訊輸入內容,並接收串流音訊輸出內容。請務必建立 LiveModel
例項,並將回應模式設為 Audio
。
瞭解如何設定及自訂回應語音 (本頁後續內容)。
Swift
Live API 目前尚未支援 Apple 平台應用程式,但請稍後再回來看看!
Kotlin
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
val model = Firebase.vertexAI.liveModel(
modelName = "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
}
)
val session = model.connect()
// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()
Java
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.vertexAI()).liveModel(
"gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with text
new LiveGenerationConfig.Builder()
.setResponseModalities(ResponseModality.TEXT)
.build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture = model.connect();
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
@Override
public void onSuccess(LiveSession ses) {
LiveSessionFutures session = LiveSessionFutures.from(ses);
session.startAudioConversation();
}
@Override
public void onFailure(Throwable t) {
// Handle exceptions
}
}, executor);
Web
Live API 目前尚未支援網頁應用程式,但請稍後再回來看看!
Dart
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel(
model: 'gemini-2.0-flash-live-preview-04-09',
// Configure the model to respond with audio
config: LiveGenerationConfig(responseModalities: [ResponseModality.audio]),
);
_session = await model.connect();
final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);
// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
// Process the received message
}
Unity
using Firebase;
using Firebase.AI;
async Task SendTextReceiveAudio() {
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
var model = FirebaseAI.GetInstance(FirebaseAI.Backend.VertexAI()).GetLiveModel(
modelName: "gemini-2.0-flash-live-preview-04-09",
// Configure the model to respond with audio
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Audio })
);
LiveSession session = await model.ConnectAsync();
// Start a coroutine to send audio from the Microphone
var recordingCoroutine = StartCoroutine(SendAudio(session));
// Start receiving the response
await ReceiveAudio(session);
}
IEnumerator SendAudio(LiveSession liveSession) {
string microphoneDeviceName = null;
int recordingFrequency = 16000;
int recordingBufferSeconds = 2;
var recordingClip = Microphone.Start(microphoneDeviceName, true,
recordingBufferSeconds, recordingFrequency);
int lastSamplePosition = 0;
while (true) {
if (!Microphone.IsRecording(microphoneDeviceName)) {
yield break;
}
int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);
if (currentSamplePosition != lastSamplePosition) {
// The Microphone uses a circular buffer, so we need to check if the
// current position wrapped around to the beginning, and handle it
// accordingly.
int sampleCount;
if (currentSamplePosition > lastSamplePosition) {
sampleCount = currentSamplePosition - lastSamplePosition;
} else {
sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
}
if (sampleCount > 0) {
// Get the audio chunk
float[] samples = new float[sampleCount];
recordingClip.GetData(samples, lastSamplePosition);
// Send the data, discarding the resulting Task to avoid the warning
_ = liveSession.SendAudioAsync(samples);
lastSamplePosition = currentSamplePosition;
}
}
// Wait for a short delay before reading the next sample from the Microphone
const float MicrophoneReadDelay = 0.5f;
yield return new WaitForSeconds(MicrophoneReadDelay);
}
}
Queue audioBuffer = new();
async Task ReceiveAudio(LiveSession liveSession) {
int sampleRate = 24000;
int channelCount = 1;
// Create a looping AudioClip to fill with the received audio data
int bufferSamples = (int)(sampleRate * channelCount);
AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
sampleRate, true, OnAudioRead);
// Attach the clip to an AudioSource and start playing it
AudioSource audioSource = GetComponent();
audioSource.clip = clip;
audioSource.loop = true;
audioSource.Play();
// Start receiving the response
await foreach (var message in liveSession.ReceiveAsync()) {
// Process the received message
foreach (float[] pcmData in message.AudioAsFloat) {
lock (audioBuffer) {
foreach (float sample in pcmData) {
audioBuffer.Enqueue(sample);
}
}
}
}
}
// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
int samplesToProvide = data.Length;
int samplesProvided = 0;
lock(audioBuffer) {
while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
data[samplesProvided] = audioBuffer.Dequeue();
samplesProvided++;
}
}
while (samplesProvided < samplesToProvide) {
data[samplesProvided] = 0.0f;
samplesProvided++;
}
}
瞭解如何選擇適合用途和應用程式的模型。
打造更引人入勝的互動式體驗
本節說明如何建立及管理 Live API 的互動或交互功能,以提升使用者參與度。
變更回覆語音
Live API 使用 Chirp 3 支援合成語音回應。使用 Firebase AI Logic 時,您可以傳送 5 種 HD 語音和 31 種語言的音訊。
如果未指定語音,預設值為 Puck
。或者,您也可以設定模型以下列任一語音回應:
Aoede (女性)Charon (男性) |
Fenrir (男性)Kore (女性) |
Puck (男性) |
如要瞭解這些語音的聲音,以及可用的語言完整清單,請參閱 Chirp 3:HD 語音。
如要指定語音,請在 speechConfig
物件中設定語音名稱,做為模型設定的一部分:
Swift
Live API 目前尚未支援 Apple 平台應用程式,但請稍後再回來看看!
Kotlin
// ...
val model = Firebase.vertexAI.liveModel(
modelName = "gemini-2.0-flash-live-preview-04-09",
// Configure the model to use a specific voice for its audio response
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
speechConfig = SpeechConfig(voice = Voices.FENRIR)
}
)
// ...
Java
// ...
LiveModel model = Firebase.getVertexAI().liveModel(
"gemini-2.0-flash-live-preview-04-09",
// Configure the model to use a specific voice for its audio response
new LiveGenerationConfig.Builder()
.setResponseModalities(ResponseModality.AUDIO)
.setSpeechConfig(new SpeechConfig(Voices.FENRIR))
.build()
);
// ...
Web
Live API 目前尚未支援網頁應用程式,但請稍後再回來看看!
Dart
// ...
final model = FirebaseVertexAI.instance.liveModel(
model: 'gemini-2.0-flash-live-preview-04-09',
// Configure the model to use a specific voice for its audio response
config: LiveGenerationConfig(
responseModality: ResponseModality.audio,
speechConfig: SpeechConfig(voice: Voice.fenrir),
),
);
// ...
Unity
Snippets coming soon!
如要提示模型以非英語回應,並要求模型以非英語回應,請在系統指示中加入以下內容:
RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.
在各個工作階段和要求中維持情境
您可以使用即時通訊結構,在不同工作階段和要求之間維持背景資訊。請注意,這項功能僅適用於文字輸入和文字輸出。
這種方法最適合用於短暫的內容;您可以傳送即時互動,以代表確切的事件順序。對於較長的背景資訊,建議您提供單一訊息摘要,以便釋出背景資訊視窗,供後續互動使用。
處理中斷
Firebase AI Logic 目前不支援處理中斷情形。請過一陣子再回來查看!
使用函式呼叫 (工具)
您可以定義工具 (例如可用的函式),以便與 Live API 搭配使用,就像使用標準內容產生方法一樣。本節將說明使用 Live API 搭配函式呼叫時的部分細微差異。如需函式呼叫的完整說明和範例,請參閱函式呼叫指南。
模型可從單一提示產生多個函式呼叫,以及連結輸出的必要程式碼。這個程式碼會在沙箱環境中執行,產生後續的 BidiGenerateContentToolCall
訊息。執行作業會暫停,直到每個函式呼叫的結果可用為止,藉此確保按順序處理。
此外,使用 Live API 搭配函式呼叫功能特別強大,因為模型可以向使用者要求後續或說明資訊。舉例來說,如果模型沒有足夠的資訊,無法為要呼叫的函式提供參數值,則模型可以要求使用者提供更多或更清楚的資訊。
用戶端應回覆 BidiGenerateContentToolResponse
。
限制與需求
請注意 Live API 的下列限制和要求。
語音轉錄
Firebase AI Logic 不支援語音轉錄功能。請過一陣子再回來查看!
語言
- 輸入語言:請參閱Gemini 模型支援的輸入語言完整清單
- 輸出語言:請參閱 Chirp 3:HD 語音 中的完整輸出語言清單
音訊格式
Live API 支援下列音訊格式:
- 輸入音訊格式:原始 16 位元 PCM 音訊,16 kHz 小端序
- 輸出音訊格式:24 kHz 小端序的原始 16 位元 PCM 音訊
頻率限制
適用下列頻率限制:
- 每個 Firebase 專案 10 個並行工作階段
- 每分鐘 400 萬個符記
工作階段時間長度
工作階段的預設長度為 30 分鐘。當工作階段時間超過限制時,連線就會終止。
模型也受限於脈絡大小。傳送大量輸入內容可能會導致工作階段提早終止。
語音活動偵測 (VAD)
模型會自動對連續音訊輸入串流執行語音活動偵測 (VAD)。VAD 預設為啟用。
符記計數
您無法將 CountTokens
API 與 Live API 搭配使用。
針對使用 Firebase AI Logic 的體驗提供意見回饋