diff options
| author | Prasanna721 <[email protected]> | 2026-01-21 03:58:26 +0000 |
|---|---|---|
| committer | Prasanna721 <[email protected]> | 2026-01-21 03:58:26 +0000 |
| commit | 0a8c5fa0497754731cf8ff97eb0f503aaabfc97a (patch) | |
| tree | d8c583febd32a98b7510a70075adf521f7fba724 /packages/pipecat-sdk-python/README.md | |
| parent | feat: mobile responsive, lint formats, toast, render issue fix (#688) (diff) | |
| download | supermemory-0a8c5fa0497754731cf8ff97eb0f503aaabfc97a.tar.xz supermemory-0a8c5fa0497754731cf8ff97eb0f503aaabfc97a.zip | |
Re - feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) (#683)pipecat-update
#### RE-RAISING Pipecat live speech PR
### Added native speech-to-speech model support
### Summary:
- Speech-to-speech support - Auto-detect audio frames and inject memories to system prompt for native audio models (Gemini Live, etc.)
- Fix memory bloating - Replace memories each turn using XML tags instead of accumulating
- Add temporal context - Show recency on search results ([2d ago], [15 Jan])
- New inject_mode param - auto (default), system, or user
### Docs update
- Update the docs for native speech-2-speech models
Diffstat (limited to 'packages/pipecat-sdk-python/README.md')
| -rw-r--r-- | packages/pipecat-sdk-python/README.md | 27 |
1 files changed, 14 insertions, 13 deletions
diff --git a/packages/pipecat-sdk-python/README.md b/packages/pipecat-sdk-python/README.md index 5f6e8478..bb8e26e1 100644 --- a/packages/pipecat-sdk-python/README.md +++ b/packages/pipecat-sdk-python/README.md @@ -107,11 +107,9 @@ from fastapi import FastAPI, WebSocket from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.task import PipelineTask from pipecat.pipeline.runner import PipelineRunner -from pipecat.services.openai import ( - OpenAILLMService, - OpenAIUserContextAggregator, -) -from pipecat.transports.network.fastapi_websocket import ( +from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext +from pipecat.services.google.gemini_live.llm import GeminiLiveLLMService +from pipecat.transports.websocket.fastapi import ( FastAPIWebsocketTransport, FastAPIWebsocketParams, ) @@ -125,10 +123,17 @@ async def websocket_endpoint(websocket: WebSocket): transport = FastAPIWebsocketTransport( websocket=websocket, - params=FastAPIWebsocketParams(audio_out_enabled=True), + params=FastAPIWebsocketParams(audio_in_enabled=True, audio_out_enabled=True), + ) + + # Gemini Live for speech-to-speech + llm = GeminiLiveLLMService( + api_key=os.getenv("GEMINI_API_KEY"), + model="models/gemini-2.5-flash-native-audio-preview-12-2025", ) - user_context = OpenAIUserContextAggregator() + context = OpenAILLMContext([{"role": "system", "content": "You are a helpful assistant."}]) + context_aggregator = llm.create_context_aggregator(context) # Supermemory memory service memory = SupermemoryPipecatService( @@ -136,17 +141,13 @@ async def websocket_endpoint(websocket: WebSocket): session_id="session-123", ) - llm = OpenAILLMService( - api_key=os.getenv("OPENAI_API_KEY"), - model="gpt-4", - ) - pipeline = Pipeline([ transport.input(), - user_context, + context_aggregator.user(), memory, llm, transport.output(), + context_aggregator.assistant(), ]) runner = PipelineRunner() |