What is Meta’s Voice SDK? Discover a New Way to Navigate the Metaverse

By Happy from MWSJ Last updated Dec 24, 2021

Over the last few years, voice has become a popular mode for navigating the digital world – from looking up queries online to getting directions while driving.

It is estimated that around 22% of internet users use voice commands to control their media and television devices and, around 40% of smartphone users feel that voice technology significantly improves the mobile UX.

Therefore, it makes sense that users would want to leverage the convenience of voice in every type of operating environment, at work, at home, or even inside virtual worlds.

Facebook (now Meta) has recently announced an offering called Voice SDK to help bring voice commands and related technologies into the VR space, and potentially one day, the metaverse.

Voice SDK can be defined as a software development kit that assembles key natural language capabilities required to design hands-free navigation systems in immersive realities, including AR, VR, and MR.

The primary use case as Meta highlights would be voice-based gameplay – but one could expect numerous other applications for collaborating in VR, VR content consumption, and much more.

Voice SDK is part of the company’s Presence Platform for Oculus developers, which gives you machine perception and AI tools to start building highly immersive experiences.

It was announced during Connect 2021 and Meta has said that the Voice SDK will be available soon in an experimental release.

Importantly, documentation for the solution is yet to become available, and we expect a full release next year with the GA version of the Presence Platform.

From early on, Facebook kept its ear to the ground and was quick to pick up any new technologies or platforms it felt might come in handy later down the line.

In the last 2-3 years, this has primarily comprised immersive reality companies, in the lead-up to Facebook’s rebranding as Meta this year.

But in addition to acquiring VR startups and early movers, Facebook purchased a tiny fledgling company back in 2015 – called Wit.ai.

At that time, Wit.ai was fresh out of Y Combinator and had built an API for voice-enabled interfaces. It saw remarkable growth in a very short time and had over 6000 developers using it, when it was acquired by Facebook.

The use case back then was simple: provide a voice-to-text capability for Facebook Messenger and possibly monetise any apps built using Wit. Notably, the service was to remain free for use.

Fast forward to 2021, and Wit.ai continues to be available as a free tool (provided you are a Facebook user), and extremely popular among the developer community for its simple, elegant, yet powerful natural language API.

In October 2021 at Connect, Meta announced that Wit.ai would power the Voice SDK component of the Presence Platform, now bringing this technology to the XR space.

Voice is typically used in scenarios where hand-based controls aren’t available or advisable.

So, does it make sense for developers to go the extra mile and bring this capability into virtual worlds, where users will have their controllers anyway?

We’d say, yes. By reducing dependence on tracked hand controllers, Voice SDK enabled more naturalised interactions in VR worlds, which are dramatically closer to life. Here are some of its transformative impacts:

1. Users can navigate the metaverse without using hand controllers

The metaverse is, by definition, a conglomeration of many VR worlds and the user navigates between them for work, play, social, and other purposes. Voice SDK will allow users to navigate using spoken commands without having to turn to their hand controllers.

Use cases like teleportation inside VR could rely on Voice SDK in a big way, and users could be empowered to move from one world to another while engaged in other activities.

2. There can be voice-driven commands in game-play

Voice commands for games is an important application for this technology, one that we expect to roll out very soon.

Meta provides a couple of examples of this use case in its October 2021 announcement, including the ability to talk to characters and NPCs, and the ability to achieve game milestones through a voice-activated magic spell.

In short, Voice SDK could transform multi-player experiences as we know it through an advanced text-to-speech capability.

3. Users will be able to set voice reminders to self

Another small but handy use case for the SDKs natural language tools is in-VR reminders. Self-reminders are a popular application for voice technology, particularly when using virtual assistants or smart home hubs.

Thanks to Voice SDK, developers can now allow users to set reminders to themselves through voice commands.

For instance, when conducting a multi-participant meeting in VR, you’d be able to make a memo-to-self without having to take off your headset or switch apps.

4. Virtual objects can be searched using voice

Similar to voice reminders, voice search is another popular use case for this technology.

Developers will now be able to bring voice search into VR, with Voice SDK allowing users to look for objects through spoken commands.

5. FAQs and help sections will support voice interactions

Long text repositories in the Oculus ecosystem such as FAQ questionnaires or an application’s help section can be made searchable via voice. Without a traditional mouse or keyboard system, looking up information help portals was always difficult in VR.

More often than not, users had to resort to their 2-dimensional web presence to find answers to their queries.

Now, Voice SDK will power a seamless, intuitive, and uninterrupted application experience, with routine support available and help via voice.

Following October’s announcement, Meta announced that the experimental release of Voice SDK was now available, starting November 2021.

Unity developers can start using Voice SDK from the Oculus for Developers portal, and support for non-unity platforms is slated for the future.

Keep in mind that Voice SDK Experimental cannot be shipped with applications – for that, you will have to wait for the production version of Voice SDK, expected by the end of 2021.