This episode is brought to you by the Cloud Wars Expo. This in-person event will be held June 28th to 30th at the Moscone Center in San Francisco, California.
Highlights
00:55 — Meta AI introduced the Generative Spoken Language Model (GSLM). This is a language-based model that’s essentially textless NLP.
01:20 — Text messages often get misconstrued because the original intent behind the message doesn’t always fully translate to the recipient.
02:10 — The goal of GSLM is to capture data in terms of human expression from speech (audio) and video inputs. It analyzes human body language, as this form of human-specific communication enhances speech.
03:00 — The context of words, how they are spoken, and the body language of the speaker are all various inputs behind the text.
03:15 — Meta’s AI approach was to have a speech-emotion conversion. The textless NLP model would take the speech input and consider four parts while processing the information to formulate an output:
- Phonetic content
- Speaking rate and duration
- Identity of the speaker
- Emotion
04:00 — The outcome was to identify non-verbal cues. This model can signal the intent of the speech or the emotion behind it.
04:31 — The wider aim of textless NLP is to understand the richness of human communication.
Looking for real-world insights into artificial intelligence and hyperautomation? Subscribe to the AI and Hyperautomation channel: