Press "Enter" to skip to content

Voice recognition on the web using IBM Watson


good Monday morning today I’m just going

to chill out a little bit and play

around with the Watson voice api’s I am

mpj and you are watching fun function

alright so IBM Watson in that I be M

State a cloud machine learning platform

thingy I’m honestly not sure iBM has

always had really confusing branding for

everything so yesterday on the twitch

stream I played around with these api’s

in preparation for a hack I’m doing this

weekend but I figured that the API was

pretty impressive and it turned turned

out really nicely when you when you did

it in a web so I figured that I would

just do a quick video where I

demonstrate how to use the IBM Watson

voice processing in inside a web app

because I was pretty fun this is not a

sponsored video or anything I just

happened to use the the Watson API is

because they had a couple of features

that I need for the hack I’m pretty sure

that you can use it do this with most

streaming api’s but what I found

particularly impressive with the the

Watson implementation is that they have

a pretty nice JavaScript API that you

can use in browser and it screams the

result two streams to result of a

WebSocket so it’s incredibly snappy

considering that it’s the voice

recognition is done over in the network

okay so let’s get started yes they

create react app my voice thing and

while we’re doing that I’m going to pull

up a browser and we’re going to do what

[Music]

it’s not one looking for what some

speech NPM yeah this is the one this is

the one we’re looking for

uh-huh cool let’s jump to the github

repo okay back to the the CD my voice

open up terminal and go and him start to

make sure that things are working yes it

works yeah I have a cold

I don’t know if the microphone picks up

my snot but like a my head is full of

snot all right the first thing that we

need to do is look at the examples here

so they have an example server here

because on that we need to have access

to the web service and tokens and stuff

like that

and so we need to store the secrets on

the server so what we’re gonna do is

we’re just gonna grab this this server

up j/s here looks like that I’m gonna

steal that going to do a server job J as

this is just a hack this is just an

exploration so we’re not bothering with

quality and crap like that they have a

bunch of stuff we don’t need they are

serving and browser if eyeing and stuff

here don’t need that for this example

we’re we’re not gonna use dot and either

because I have already set the secrets

for the the API in my environment

variables in my bashed-up profile so

that I don’t accidentally share my

secrets with you find people you’re all

fine people except one of you who is a

criminal and that person is destroying

things or

let’s remove this thing this is the

browserify stuff we don’t need that my

nose token end points we need this

speech-to-text thing here but we don’t

need the text-to-speech thing we don’t

care about that at all

currently that most of this looks fine I

think this thing we can’t use port 3000

because that’s what’s being used by the

create react app dev server I’m going to

change that in 3000 to other than that I

think that all we need to do this is

stole all of these these things here so

let’s do NPM I express and we need the

Watson developer cloud and we need the B

cap services I’m not sure I don’t know

what that is

but it’s used here and I am too lazy to

investigate it and I think that’s it

boom

installing please stand by sorry about

the studio being messy again by the way

it’s because I’m painting that wall

there so I had to move the desk here and

move a lot of boxes there and I’m also

painting there it’s come to be nice

eventually I promise like it’s just a

constant one more thing kind of

situation with the studio okay let’s see

if the server runs node SRC server of

chess BAM it didn’t run it broke web

pack is not defined it’s setting up the

web pack compiler we don’t need that

it’s we’ve got to remove it let’s try to

run it again still fails no such file or

directory localhost okay all right the

point is a chrome requires HTTPS to

access the user’s microphone unless is a

low

host your eyes so this is a basic server

on port 3001 using a self-signed ii a

certificate so we probably honestly

don’t need that since we’re just gonna

do explorations of localhost to be

honest but they actually do provide some

certificates here so I’m going to just

steal these and this was local hosted

cert we also need the local host dot

p.m. there we go let’s see if this works

all right cool what’s an IBM speech just

kgm skip applying an app token server

live at localhost 3002 that’s a catchy

name in it ah right

this means that we can go jump to the

example code here you let’s have a look

let’s have a look okay I’m loosing

myself here they are here in the static

directory screwing down to Microsoft

streaming object extracted to console

I’m just gonna copy paste this part here

and we’re gonna jump into our app I want

to trigger this on on a like button

press so I’m going to delete this we

create a button and then listen to

microphone what do you do and on click

it’s going to do things like on

and 200 listen and then that there yeah

so you know what let’s just try that

what’s the speech is not defined unless

and click it’s not fine no it’s gonna be

this dot on this and click what’s the

speech is not defined right this is well

it’s not a fine need to get that from

somewhere actually if we go back to the

to the npm module to the root we’ll see

that there’s a Watson speech bottle you

can require it like sub parts here

see yeah like we’re requiring Watson

speech slash speech to text slash

recognize microphone and if you have a

look over here you see that it’s the

same structure as this thing here so if

you load it using some method

like Bower it will load it into the

global scope and do this like we did

JavaScript it back in the dark ages but

he use NPM here so we’re gonna do that

let’s steal that let’s pull that in here

and we’re going to use import recognice

Mike from something like that see if

does this look like watching speech it’s

not fine now we’re gonna use this from

here see what that looks like module not

found what’s a speech no because we

haven’t installed that module quite yet

so let’s cancel the work shut down the

reactive alabaster and go NPM I Watson

speech see what that

gives us installing please stand by okay

let’s start this surrogate see what

happens really nice that we add the

react development thing just

automatically reloads I click let’s open

up the console over here because I know

that my face will be down here and

listen to microphone okay broke break

ins breakage okay fail to construct web

socket the air contains a fragment

identifier oh you know what betting that

this is the problem is that this token

here is messing things up let’s the

console dot log slash token yes

token see what happens see the

microphone scrolling up hey token is

okay

token adds a lot of HTML oh okay

so we are a local host 3000 but the the

server that we created the one that is

supposed to provide the token it’s

actually on three thousand and to

remember it because we are running our

react development server 3000 we changed

that so I’m just going to do a dirty

thing here and do low close three

thousand two and that is going to

improve things

listen to microphone see what we get

okay it just failed to fetch that’s nice

oh no access control origin it’s

cross-origin requests since we are

developing and playing around here we’re

just going to allow all the cross-origin

requests in the world you can do that by

just require the course module and then

use that in Express also coffee and

let’s hoops we’re starting the wrong

thing

let’s do NPM I course so that we

actually get the module then we’re going

to restart the app developer no no they

don’t server our token server COBOL

install please note as a server

we’re up and running okay let’s click

that again okay cannot sit all right

cool we’re getting some alternatives

here okay you see here that it’s

actually doing some really bad parsing

of what it is that I am saying because

voice finn’s is these okay here cool it

just did what I just basically parse

what I said hesitation I like this

hesitation thing okay you see here

that’s actually be doing some really bad

hesitation is it aged a lot well

speaking I knew I want to grab this and

actually render it to screen because

that would be really cool so let’s go

back to this thing let’s have a look at

what this looks like so it’s the data

object is has a little turn ative array

and that alternatives has each each

array item has a transcript property so

we’re going to use that see if we can

just add a dip here and dave is going to

just contain state I’m sorry this dot

state dot txt and oh you know what this

won’t work

yeah because this thing like state is

going to be no when it’s in its initial

state so we don’t have to do a

constructor here and we have to set this

dot state to lemon tea object I suppose

and then we also need to call super here

because we need to do that in classes in

the Oscar see what this looks like okay

it’s not breaking any more at least now

now when we get data we’re going to grab

the I got to grab the data dot

alternatives and the first alternative I

think that’s okay and then I think it’s

oops and then just go this dot state dot

see what’d that how what’d that looks

like all right

cannot set probably text or undefined no

okay first of all I have the wrong

okay all right there BAM so this dot

subset it’s not function no it’s because

this is now incorrectly scope because

when we do this listen click here first

of all we need to bind this to this

because we need to bind it’s it’s a it’s

all a big mess if you are interested in

learning about bind and this I made a

series on that here we’re not gonna go

into why this is breaking because you

need to understand this when you develop

JavaScript it’s it’s very tricky but

check out that video if you are confused

then we also need to do make this arrow

function maybe no I’m not actually what

actual let’s try not let’s not do code

unless we absolutely how do you know

it’s the work so like this is your arrow

function okay think that might work what

no nope we need to still make this an

error function as well so that this this

scope of this is preserved all through

the chain all cool now it’s updating the

div it’s thought I said Dave but I said

div but this is pretty cool right if I

speak very clearly and eloquently then

it actually understands what I am

but if I’m speaking here slowly it just

we all black closing the loading the

console this is basically what I wanted

to show you

ah let me just actually style this up a

little bit because I think it’s cool

there alright how what does this look

like it’s now listening to the

microphone eventually this is what I

wanted to show you react and what’s an

voice recognition integrated it’s

surprisingly fast considering that this

is all done over the network and this

API is free you could sign up all while

your schooling Watson speech API said

click service passwords and let me

actually just show you the example

server these things here it’s just the

speech-to-text used today by the

speech-to-text password I’ve set these

and my bash profile so that they are

accessible here and you can just get

your own speech text username a speech

text password on by signing up for

what’s the developer cloud I must say

this is really fast really impressive

maybe there are other voice api’s that

are also equally impressive but I a

solid was fun

to show you what I’ve been doing and

that is it for this cold episode of fun

fun function I put I release these every

Monday morning Oh 800 GMT you can

subscribe here so that you don’t miss it

or you can look at another episode right

now by clicking here I am mpj now having

a cold until next Monday morning thank

you is

Please follow and like us: