Things I Learned at Project Voice -- THE Journal

Features and Cover Stories

Things I Learned at Project Voice

Could 2020 be the year of the voice? These voice experts think so.

01/17/20

This week I'm attending Project Voice, an annual conference in Chattanooga that pulls together people tapping into tools from Amazon, Google, Microsoft, Samsung and other companies to develop and use voice applications in education, business, healthcare, government and the home. Before my attendance at this event, the extent of my knowledge of voice was how to turn off the light in my son's bedroom using his Echo Dot. Here are a few things I've learned since arriving.

1. Podcasts are passé

For the latest go-to format, check out "flash briefings," short pre-recorded audio shows that you can subscribe and listen to through Alexa. Teri Fisher, who produces Voice in Canada, a daily flash briefing show that covers Amazon Alexa and Echo tricks, news and reviews, offered his favorite tips for creating effective shows, including these:

"Pick a topic you're passionate about"; otherwise, the work will become drudgery.
Stay consistent so people know they can count on a new flash briefing from you every morning over their cuppa.
Keep the shows short. His preference is to produce flash briefings between two and three minutes long.
Keep your intro and outro no longer than five or 10 seconds each.
Consider the use of gaming through your flash briefing to engage listeners and perk up listenership.

Fisher has introduced a free online video course to help people who have no experience learn how to set up their own flash briefings. (That would be me.) I'm going to take the course over the next month so I can try out a flash briefing on my favorite beat: education technology news.

2. Google isn't sitting still

While we may all have Echo devices sitting in our personal spaces right now, Google could undercut Amazon's positioning with a long list of innovations. In her keynote, Cathy Pearl reminded us that Siri wasn't the first voice assistant to show up. Pearl, author of Designing Voice User Interfaces and head of "Conversation Design Outreach" at Google, wrote her first chatbot on a Commodore 64 when she was 12. In 1994 a company called Wildfire introduced a "kind of voice assistant" available only by phone -- and mostly landlines at that. This early application portended the use of hot words, included a contact list for dialing and had "call whisper" so the person you were talking to didn't hear the behind-the-scenes information the assistant was feeding to you. Then Tellme came out in 1999 with a voice portal that would deliver sports scores, travel information and weather. Although the company is gone (acquired in 2007 and divested in 2012 by Microsoft), noted Pearl, the service lives on. When people call the number, it welcomes them and offers a menu of things to do.

Now, by 2020, Google has made several moves in voice. First, it has begun to address privacy and trust issues by letting users set whether and when audio data is stored by default. Settings are available that let the user choose when to delete data or when to do deletions on a rolling cycle. (In fact, deleting data can be done immediately by voice: "Hey, Google, that wasn't meant for you.")

The new Nest Mini uses common commands locally and doesn't have to go to the cloud to know what to do.

Accessibility for both cognitive and visual issues has been a big driver. In the last year, the company has announced a slew of new technology work:

Project Euphonia strives to teach Google Assistant voices that don't use standard speech.
Project Understood addresses the communication needs of people with Down Syndrome.
Google Live Transcribe creates a visual transcription of what's being spoken, which is "great for people who are deaf," Pearl said. The words show up on the mobile device, such as a phone, as they're spoken. That includes picking up ambient noises -- those non-verbal parts of our communication.
Sound Amplifier provides an audio boost for situations where there's a lot of ambient noise or somebody is speaking especially softly.
And Live Caption takes the captioning feature available for videos uploaded to YouTube and makes it available for any videos or audio recordings.

Another aspect of voice technology that Google is putting energy into is the use of multiple voices -- 11 different ones right now in Google Assistant, according to Pearl. That's "really important," she said, to help tech fight against gender stereotyping. "We need to get past the idea that female voices are more helpful." (She shared a story about being in a Lyft and noticing that the driver was using an Australian voice for directions. When she asked why he chose that, the driver explained that when Google Assistant doesn't understand him, "I don't mind because she's not from around here.")

3. Adoption in the classroom is cheap and easy (but there's a wishlist)

Educators Julie Daniel Davis, director of instructional technology and innovation at Chattanooga Christian School, and Rebecca Dwenger, instructional technology consultant for Hamilton County Educational Service Center in Cincinnati, OH, shared their experiences about voice in education. "I see people who are normally like, 'I can't do that,' embrace voice technology because they can do it," said Dwenger. "It's natural language." And while educators are "amazing," in how quickly they'll adopt it, she added, "the people who are struggling are the tech coordinators, the superintendents -- those running a district."

Daniel Davis concurs: The pushback I hear most often is from people in the IT world who are scared of student privacy concerns," she noted. " Their message to her: "Convince me that we can use this and still be compliant."

Both educators shared what Dwenger called "shiver stories," examples where the voice technology reached students in ways that nothing else could. For Dwenger the proof of the impact comes up in working with students with autism. "A lot of students don’t speak up. They have to be taught how to communicate," she explained. With the use of voice assistance devices, "they're communicating. If they ask for a question, they're asking follow-up questions." "That does not happen," she said, with students who have autism.

Daniel Davis sees benefit among several "niche" segments -- such as English learners. "They're in the classroom to learn how to speak better," she said. When the teacher corrects them, there's a "little bit" of discomfort. When Google corrects them, however, "there's not that shame."

Daniel Davis and Dwenger shared their wishlists for voice technology in 2020. Daniel Davis' list has three items:

A dedicated education device, "so I don't have to work around the system to make sure I'm compliant with all the things I need to be compliant with."
The ability for students to create for voice. "I want a platform where they can go in and make their skills. I want preschool, middle school [and] high school versions because I do believe this skill is something that's going to be part of their future."
And she wants bundling. "I want a teacher to be able to put a device in her classroom and not have to remember a single invocation word." On top of that she'd like curated lessons that map to the learning standards. "Teachers don’t have enough time to look for skills." Her preference: for companies to do that for teachers.

Dwenger wants every student to be able to use voice for learning -- whether they're at school or home and no matter which technology platform they're using, so "the teacher has access to that and can see their progress."

"As educators, we're constantly looking for school resources we can use that are good resources but at as low a price as possible," Daniel Davis pointed out. Compare voice assistance to Chromebooks that will cost $150 for every student, she suggested. "You're talking about one $50 device in the classroom and everybody has access to the world."

4. Building voice experiences is different for children

"Kids have their own way of learning things," said Jeremy Wilken. "Their voice is less vocal; they have smaller vocabularies and it's different; they enunciate differently and sometimes leave out words; their grammar isn't perfect; their access to devices may often be through adults; their attention goes everywhere; and they are emotionally driven -- emotional firecrackers," he explained.

Wilken maintains Design for Voice, a personal project to help people learn about voice design. In a session on how to design voice applications for children, he pointed out that the potential is there "to build something for kids that they can use at a very young age. It's interactive and gives them power and autonomy in a way that even tablets can't yet."

Among Wilken's advice:

Latency exists and it's "super important" to acknowledge that it happens and that it can frustrate young users because they don't understand the concept of waiting. "They're going to keep talking," said Wilken. To appreciate the problem from a child's perspective, developers need to put delays into their systems -- to slow it down -- "to see what it's like" -- and then address it by doing something in the interim to show the kids that the voice assistant is working on the request.

Design for emotion. "Kids are emotional beings." To test how the app responds to heightened emotion, developers need to test with "over-emphasized dramatic personas to increase their empathy," he suggested.

Speech-to-text is far from perfect. To build an experience or skill for children requires ferreting out those common words that have double meanings, test the app to see what happens when the assistant gets only part of what was said, and check it with higher-pitched voices or in loud rooms.

Check for repetition. If a young user is repeating what he or she said multiple times, that should be a clue that the assistant is interpreting something wrong and needs to prompt for more information or for a different way of expressing the request.

"Expect the unexpected." As Wilkin advised, developers need to put themselves into real-world scenarios to gather insights into people's mindsets and to "prototype early with children."

5. Voice for education apps are gaining recognition

The Oscars aren't the only award show in town. Project Voice gives its own honors to recognize companies that are pioneering voice applications for Alexa, Google and other new platforms that are just emerging. This year's awards recognized education-specific applications in two categories:

Voice experience of the year for education, with these finalists:

Sermo Labs' 1-2-3 Math;
Bamboo Math;
Voicelets, to help students learn by listening;
Highlights Storybooks from Bamboo; and
Novel Effect, a voice-driven storytelling app for families.

Highlights took this category.

And voice developer of the year, with these finalists:

Pretzel Labs, a voice design studio
Sermo Labs
Voicelets
Bamboo Learning
Novel Effect

Bamboo Learning won this award.

Bradley Metrock, who produces the Project Voice conference, hosts This Week in Voice and is author of More Than Just Weather & Music: 200 Ways to Use Alexa, considers the use of voice in education a big priority for the technology. As someone who grew up with a speech impediment, he explained, it's interesting that so many people struggle during their first use of a voice assistant to be understood. "You say something once, and Alexa doesn’t understand you. Then you realize it, so you say it again and again and again. You say it over and over until you finally get it right -- and then it understands you. That's magical."

E-Mail this page

Printable Format

Featured

OpenAI Launches 'Reasoning' AI Model Optimized for STEM

OpenAI has launched o1, a new family of AI models that are optimized for "reasoning-heavy" tasks like math, coding and science.
California AI Watermarking Bill Supported by OpenAI

OpenAI, creator of ChatGPT, is backing a California bill that would require tech companies to label AI-generated content in the form of a digital "watermark." The proposed legislation, known as the "California Digital Content Provenance Standards" (AB 3211), aims to ensure transparency in digital media by identifying content created through artificial intelligence. This requirement would apply to a broad range of AI-generated material, from harmless memes to deepfakes that could be used to spread misinformation about political candidates.
Report Estimates Cost of AI at Nearly $300K Per Minute

A report from cloud-based data/BI specialist Domo provides a staggering estimate of the minute-by-minute impact of today's generative AI boom.
Juniper Intros AI-Native Networking and Security Management Platform

Juniper Networks has launched a new solution that integrates security and networking management under a unified cloud and artificial intelligence engine.