๐ฅ Gate Square Event: #PTB Creative Contest# ๐ฅ
Post original content related to PTB, CandyDrop #77, or Launchpool on Gate Square for a chance to share 5,000 PTB rewards!
CandyDrop x PTB ๐ https://www.gate.com/zh/announcements/article/46922
PTB Launchpool is live ๐ https://www.gate.com/zh/announcements/article/46934
๐
Event Period: Sep 10, 2025 04:00 UTC โ Sep 14, 2025 16:00 UTC
๐ How to Participate:
Post original content related to PTB, CandyDrop, or Launchpool
Minimum 80 words
Add hashtag: #PTB Creative Contest#
Include CandyDrop or Launchpool participation screenshot
๐ Rewards:
๐ฅ 1st
OpenAI Unveils GPT-Realtime Speech-To-Speech Model With Multimodal Support And Advanced Conversational Capabilities
In Brief
OpenAI released the gpt-realtime speech-to-speech model with multimodal support, advanced conversational skills, and strong audio reasoning performance.
Artificial intelligence research organisation OpenAI announced the general availability of its Realtime API, now enhanced with features that allow developers and enterprises to build robust, production-ready voice agents. The API supports remote MCP servers, image inputs, and phone calling via Session Initiation Protocol (SIP), enabling more capable and context-aware voice applications.
Alongside the API, OpenAI has released its most advanced speech-to-speech model, gpt-realtime, designed to improve instruction following, function calling, and natural-sounding speech. The model can interpret complex prompts, switch languages mid-sentence, reproduce alphanumeric sequences accurately, and capture non-verbal cues. Two new voices, Cedar and Marin, are also available, offering more expressive and human-like intonation. Existing voices have been updated to incorporate these enhancements.
The Realtime API processes audio directly through a single model, reducing latency and preserving nuance, unlike traditional pipelines that chain separate speech-to-text and text-to-speech models. gpt-realtime has been trained in collaboration with users to excel in real-world applications such as customer support, personal assistance, and education. Benchmark evaluations show substantial improvements in reasoning, instruction adherence, and function calling accuracy compared to previous models.
Additional updates include asynchronous function calling, allowing long-running operations without interrupting ongoing conversations, further supporting seamless, production-ready voice experiences.
OpenAI Expands Realtime API With MCP Support, Image Inputs, SIP Integration, And Cost-Saving Controls For Voice Agents
OpenAIโs Realtime API now includes new features designed to simplify integration and expand capabilities for production-ready voice agents. Developers can enable remote MCP support by linking a session to an MCP server URL, allowing the API to manage tool calls automatically and access additional functionalities without manual setup.
The gpt-realtime model now supports image inputs, enabling the system to incorporate photos, screenshots, and other visuals alongside audio or text. This allows users to ask context-specific questions about what they see, while developers retain control over which images are shared and when.
Additional improvements include Session Initiation Protocol (SIP) support for connecting apps to phone networks and PBX systems, as well as reusable prompts that let developers save and deploy pre-configured instructions, tools, and example messages across multiple sessions.
The generally available Realtime API and gpt-realtime model are now accessible to all developers, with pricing reduced by 20% compared to the previous gpt-4o-realtime-preview. New controls for conversation context allow for smarter token management, reducing costs for long-running sessions. Documentation, a Playground for testing, and a Realtime API prompting guide are available to support developers in adopting these features.