OpenAI is enhancing ChatGPT by rolling out Advanced Voice Mode with Vision for paid subscribers. This feature integrates real-time video, allowing the AI to process visuals from a smartphone camera.
The feature, part of OpenAI’s GPT-4o model, was announced during the May Spring Updates. It enables ChatGPT to interact visually with environments, providing a dynamic and engaging experience. This feature is exclusive to ChatGPT Plus, Team, and Pro subscribers.
ChatGPT Voice Functionality and Applications:
Users can engage the camera by tapping the Advanced Voice icon in the ChatGPT mobile app and selecting the video option. This lets the AI view surroundings and respond based on visual cues. Practical uses include getting recipe ideas from the fridge’s contents, choosing outfits from wardrobes, or learning about visible landmarks.
Just in time for the holidays, video and screensharing are now starting to roll out in Advanced Voice in the ChatGPT mobile app. pic.twitter.com/HFHX2E33S8
— OpenAI (@OpenAI) December 12, 2024
This feature also supports an emotive voice response system and low latency, enhancing natural language conversations. The AI’s temporary memory of visual details boosts user interaction, making it a valuable tool for continuous learning and engagement.
Read: OpenAI Launches ChatGPT Pro for Engineering and Research
An integrated Screenshare option, accessible via the app’s three-dot menu, lets the AI interact with other apps on the user’s device. This aids with smartphone-related queries and tasks.
The rollout is underway, and Team subscribers will soon gain access. However, regulatory issues have made it unavailable in certain regions, including the EU. Enterprise and Edu users should expect access by early 2025.
OpenAI’s Advanced Voice Mode with Vision significantly advances AI interaction by merging visual and auditory data. This expands ChatGPT’s capabilities and sets a new standard for AI communication.