Kyutai Labs launched Moshi AI, a real-time voice-responsive chatbot, on Wednesday.
The French AI firm developed Moshi’s entire audio language model in-house. It can express emotions and respond in various speaking styles. Importantly, Kyutai Labs has made Moshi available to all users for free. Conversations are limited to five minutes.
Interestingly, OpenAI announced speech features similar to GPT-4o, which has yet to be released.
Moshi AI Features
Kyutai Labs developed the AI model in six months with a team of eight people. During its unveiling in Paris, the firm clarified that Moshi is not an AI assistant but a prototype for developing various tools. Users can sign up via email to access the platform.
The interface is minimalistic. Users can check their voice loudness while a text box displays AI responses. Technical details like audio duration and latency appear in another box. At the top, a button disconnects the call, lasting up to five minutes.
User Experience
Gadgets 360 reported extremely low latency, with the AI often responding instantly. However, heavy server loads can cause response delays of up to 10-15 seconds. Sometimes, verbal prompts were not registered despite high volume levels.
Moshi AI can respond with a passionate voice and various speaking styles. When connected to the Internet, it can fetch responses from the web. The chatbot only allows voice interaction, not text prompts.
Kyutai Labs plans to open-source the AI model but has not yet hosted the model weights and code. Once available, users can download and run it locally on unconnected devices.