All posts filed under “Speech Recognition

comment 0

Hey Siri, summon someone with personality

https://www.youtube.com/watch?v=-qCanuYrR0g&feature=youtu.be

Google Duplex is a pretty interesting experiment in psychology: relative to the normal Google Assistant voice, the one used in the reservation phone call is full of normal human speech “disfluencies”…ums, and ahs, etc. As far as the Google Assistant (and Alexa, etc.) have come, the duplex voice sounds far more natural and less robotic.

I’m sure the science behind the disfluencies (DFs) is probably really cool — though maybe Duplex is rigidly scripted and thus only as cool as a well written dialog — and it makes me wonder whether this is a preview of where the voices of assistants are going. Apparently there is some debate amongst linguists about the role of DFs: do they service a purpose (like filling a gap in conversation so as not to cede control), or are they merely sloppy?

Anyway, in AI driven personas, you wonder whether the amount of DFs will be tuned based on the need to develop trust. In the video above, the guy at the restaurant doesn’t know the caller. Why does Google need to make him think he’s a person, or person-like? Would the result of the call be different if it was a super mechanical robo voice, instead of the chill bro like one? Would the restaurant guy just hang up?

I have a hard time believing that people will get very comfortable talking at length to an AI if it feels very transactional. That’s fine for the occasional score or weather report. But if the goal is for a person to have, well, a relationship with a bot — as weird as that sounds now — it can’t sound like an IVR. The dialog I have with a close acquaintance changes all the time: start of a work meeting banter, then crisp discussion. Or at home: my pre-coffee grunts vs. dinner conversations. Will a principle AI you’re interacting with have to adapt on the fly to your mood, or the content/context? Getting perky, upbeat responses when you’re stressed and late to a meeting will be annoying.

Siri is Siri, 24 hours a day. No one wants that for any extended conversation. I wonder how far off we are from having the ability to “summon” the AI you want at the time. “Hey Siri, summon Phyllis, the one with the limitless supply of racy jokes.”  Or, “Hey Siri, summon Paranoid Joe, the guy with the conspiracy theories.”

comment 0

Will AI’s have to raise their hands?

Really interesting review from Dieter Bohn on Sonos’ The Beam product, which has Alexa integrated into a sound bar for use with your TV. Accurate sound. Good — if a bit laggy — voice control over a TV using Alexa. Reasonable price.

Beam, AI, Alexa

Source: The Verve, Dieter Bohn

But I found the most interesting part of this not the hardware, but the notion of being AI independent. Sonos talks about supporting multiple AIs in the future because a living room device, unlike a phone, is naturally multi-user. The kids might have Android devices, and the parents iPhones, say, and thus their fully tuned and personalized AI’s could be different.

Begs questions about AI contention.

What AI responds to my question? Assuming that speaker recognition works well enough to identify me, would I want the AI that really knows me from my mobile experience to respond, or the AI that was last, call it, instantiated? If the AIs are all present, all the time, through the same “Switzerland” of a device, who has primacy? Is there going to be an AI referee that chooses the subject matter expert? Apple just demoed not having to say “Hey Siri” to invoke it. It would be cheeky and a bit funny to have Google Home start responding to Siri questions: “Siri, you’re still an idiot. The right answer is…” Chaos.

Personally I believe we will each use multiple AIs which know us to different degrees.

 

 

comment 0

@techvitamin 2.3: Matt Revis, VP Product, Jibo

As they say, hardware is hard. Matt Revis -- a veteran of the speech recognition wars at Nuance, and now VP of product at robotics startup Jibo -- is no stranger to this. Getting various software keyboards and versions of Dragon shipped by OEMs on hundreds of millions of handsets (smart and no so smart) takes a willingness to grind, and Matt has that in spades. Good thing too because he's jumped into an exploding segment -- intelligent home devices -- with relentless, well-funded competitors who have platforms and data that provide quite a moat. Jibo is taking a different approach than, say, Echo or Google Home. They believe an anthropomorphic little robot, tuned to interact and genuinely connect with different members of the family, is a differentiated play versus static appliances with disembodied personas (Alexa, Google Assistant, etc.). Much of this strategy is based on research done by Cynthia Breazeal, the magnetic robotics star who pioneered this work at MIT's Media Lab before its spinout into Jibo. Both Matt and Steve Chambers (Nuance's dynamic #2 for years) have signed up to help Cynthia bring the little robot to market.

Matt Revis, VP Product, Jibo

As they say, hardware is hard. Matt Revis — a veteran of the speech recognition wars at Nuance, and now VP of Product Management at social robotics startup Jibo — is not someone to shy away from a tough challenge.

Getting various software keyboards and versions of Dragon shipped by OEMs on hundreds of millions of handsets (smart and some not so smart) takes a willingness to grind, and Matt has that in spades. Good thing too, because he’s jumped into an exploding segment — intelligent home devices — with relentless, well-funded competitors who have platforms and data that may provide quite a moat.

Jibo is taking a different approach than, say, Echo or Google Home. They believe a slightly anthropomorphic little robot, tuned to interact and genuinely connect with different members of the family, is a differentiated play versus static appliances with disembodied personas (Alexa, Google Assistant, etc.). Jibo is all about being relatable, and funny, and someone you’re invested in as they “grow”.

Much of this strategy is based on research done by Cynthia Breazeal, the charismatic robotics star who pioneered this work at MIT’s Media Lab before its spinout into Jibo. Both Matt and Steve Chambers (Nuance’s dynamic #2 for years) have signed up to help Cynthia bring the little robot to market.

It won’t be easy. The tech (think Alexa strapped to an Echo that moves in place but also has facial recognition and a display) has a lot of surface area where the table stakes are moving very quickly. And once they’ve figured all of that out, then they need to build and sell it.

But Matt (and Steve) believed in speech-based personal assistants years before Siri, and if anybody can do it they can. In this episode, Matt and I discuss many of their challenges, their unique approach, and how they doing. It’s “the hardest thing I’ve ever done, and the most fun — both by a lot”, and you’ll hear the authentic voice of the entrepreneur.  Have a listen to the podcast, but also watch the Jibo Program Update below, which gives you a sense of the V1.0 product, but also of how the business is managing the expectations of a community eager to get its hands on the guy.

Here’s a snippet from the full podcast:

[soundcloud url=”https://api.soundcloud.com/tracks/291170788″ params=”theme_color=f2f2f2&auto_play=false&hide_related=false&show_comments=false&show_artwork=false&show_user=false&show_reposts=false” width=”100%” height=”100″ iframe=”true” /]

https://youtu.be/XuH_iaANSq0

 

Play
comment 0

@techvitamin 1.2: Former Swype CEO Mike McSherry and Sundar Balasubramanian on Healthcare Tech

Mike McSherry

Mike McSherry, Entrepreneur

This episode ranges pretty far afield. It’s mostly about healthcare tech of course — because that’s what Mike and Sundar are spending their time on right now. But they are serial — and very successful — entrepreneurs and have a unique perspective on tech, entrepreneurialism, and what might work. They’re pretty fearless.

Mike in particular has picked and created winners in radically different domains: he’s founded phone companies (yes, plural), and a company that sells embedded device software. Shree, who joins the episode as a guest host, has long had an interest in healthcare tech.

After having sold their startup Swype to Nuance for $100M, Mike, Sundar and Aaron Sheedy eventually moved on to figure out the next thing. The first post-Nuance project involved rocket propelled drills. This is discussed in the podcast, and happily, they didn’t incinerate themselves in the basement of a UW building.

They are now EIRs at Providence Health and they can pretty much explore whatever they want. Devices. Services. Prevention. Tech to reduce readmit rates. I don’t think they are developing new drugs, but they have a pretty broad scope.

In this episode we talk extensively about what they’re seeing, including new exciting new areas of innovation, things that are harder than they expected, and areas that’ve surprised them. We talk about Shree’s tow truck metaphor (which really is perfect).

One topic that I haven’t seen discussed before — though I’m sure it has been — is whether this incredible innovation will really serve those who are most sick, or those who are collectively costing the system the most. It’s one thing to be rich and have a drug cocktail customized to your genome, and another thing to be poor and sick. Is the life expectancy gap between rich and poor going to expand at a more rapid rate? Does drug innovation target the most broad causes of illnesses, or ones that have a good chance of getting paid for?

Based on what these guys are seeing, one thing seems really clear: being paid a fixed, and ever lower, amount for certain procedures is providing massive motivation for the providers to innovate cost out of the system. God bless America.

Sundar is currently an EIR at Providence Healthcare. Prior to Providence, Sundar ran product management at Swype which was his second adventure with Mike McSherry. Before that Sundar worked for Mike at Amp’d Mobile as well. Sundar has also held multiple product management roles at Qualcomm working on mobile OS’s, emerging market smartphone strategy, and mobile commerce. Sundar is a Seattle transplant, originally from California. He graduated from U.C Berkeley and has a background in Computer Engineering. He’s a backpacker, hiker, technologist, and dog-owner.

Sundar Balasubramanian

We touch on Amazon’s Echo a bit too. All of us have been involved in Speech and Natural Language processing, and this device, which is a far more disruptive factor in the market than most people know, may come to be the most surprising application of these technologies. Amazon is doubling down in a big way — as they should.

Play