Press "Enter" to skip to content

Though I Talk Through The Valley | Chris Landreth | TEDxUofTSalon


yeah I’ve been working with a team of
researchers at the University of Toronto
about two years now and see if I can get
this working yes we have been
concentrating on animation character
animation and in particular the really
what to us is the most important part of
that which is the human face in any if
you look at any given movie for example
you’ll see probably about 70 to 80
percent of the shops being closed up of
human face and in animated films getting
that human face right whether it’s a
human character or a human-like
character is X will either make or break
your belief believing in the film and
often that is a difficult test where
filmmakers actually fail to do that and
so a so intense is our acuity in seeing
human faces that for example I’m gonna
give you a pop quiz here two faces here
one of them belongs to the 44th
President of the United States Barack
Hussein Obama raise you’re gonna say
like look at you here raise your left
hand if you think it’s the guy on the
left your right hand if you think it’s a
guy on the right okay so I’m looking
m’kay so I’m saying pretty much all
right hands and they all went up pretty
quickly so yeah you’re correct that is
Barack Hussein Obama on the right the
guy on the Left fellow named Luis Ortiz
otherwise known as Bronx Obama if you go
to the New York Times website and type
in his name you will see a documentary
on Luis Ortiz he’s famous obviously more
famous when Obama was president as being
the guy the go-to guy of any guy in the
world who looks like Barack Obama now
think about that there are seven billion
people on this planet and this guy this
one guy here Louis Ortiz is the guy who
looks the most like Barack Obama and yet
I saw right hands go up within a half a
second of my asking that question what
is it that separates or distinguishes
Luis Ortiz from Barack Obama I think it
would be kind of hard pressed to tell me
the actual differences but instinctively
you knew the difference and this is a
testimonial to our ability to a part of
our brain which is dedicated from birth
to being able to distinguish one face
from another a familiar person from a
stranger or one expression from another
and when you’re an animator you totally
have to get that right and if you fail
to get it right even by a little bit you
fall into this thing that’s called the
uncanny valley
so the uncanny valley is a two
dimensional graph in which horizontal
axis is realistic vertical axis is
trustworthy you would like to think that
those things are interrelated that they
go hand-in-hand for example if you take
a very simple character who’s not very
realistic
like say this guy I said I’m gonna punch
you and come hell or high water I wait a
minute ice-cream-truck okay so this dude
is very simply drawn he’s not very
realistic and you know we can you can
star in the Simpsons and tell some
really simple funny stories and yeah we
buy him on that level going further to
the right we can get a little more
complex run away I in the woods anywhere
[Music]
okay snow-white is literally more
fleshed-out she’s got really great
human-like gestures simple and stylized
but we can tell more complex stories
with snow white then we can get even
more complex going further to the right
school was great
all right Riley is everything okay so
yeah that’s Pixar film inside out and
yeah you what you see there’s a
realistic shader or of texture on the
purr on her skin Riley’s skin you see
really shiny hair it looks realistic and
you see a expression she does that I
roll and that’s what we humans do and
Pixar has made an incredible reputation
of being not just realistic
but believable so let’s skip ahead to
the all complete all the way over to the
right and we see absolute realism like
this dude I’m totally real and totally
human you can trust me
yeah totally trustworthy right because
he’s got real skin and he’s totally real
right but between the Pixar character
and the dude on the right there’s this
weird phenomena that happens in which
characters are so realistic that they
mimic reality but they’re not quite real
like this thing I do rock eternity
you like zombies zombies eat brains mom
no one likes zombies they’re an
abomination I am a social companion I
can speak of emotions that I can
recognize people she’s singing I feel
fantastic hey hey hey that was there was
a video that went viral a few years ago
and no it’s kind of terrifying right
something that kind of fits into what
Steve was talking about earlier these
characters are human-like but you don’t
really get the impression that they’ve
got an imagination that they’ve got
creativity in any real sense that Steve
was trying to sort of expose us to here
and that’s the problem that there’s
realism without reality and it takes
away trust and believability
and where that really happens where that
trust gets gets compromised is with
speech because speech hat carries a lot
of the emotion and a lot of the
intelligence and the imagination that we
as humans show when we’re expressing
ourselves through words now what will
sometimes happen what researchers have
been working on has been a realistic
kind of speech that can be generated and
put onto CGI characters whether through
artificially intelligent derived speech
generation or through an actor actually
voicing these words and we’ve come a
fair amount of on this but we’ve have
not achieved the kind of believability
that I would like to see in that I think
that would be share with other people
who are dealing with these characters
this one I’ll show you here they are son
una fecha parlante molto Giovanni Corina
Poe so a serie de barro molto triste
monkey dombarris Pavan Tata with
voice-o-matic great quality lipsync
Ennis okay so there’s yet certainly the
semblance end of speech and those
characters are properly articulating but
one can see that it’s not really quite
there and the reason it’s not going
there is because we as human beings have
a great deal of complexity in our faces
in the machinery of our faces that allow
us to do some pretty nuanced and very
precise stuff with our lips and our jaws
that gives the human-like quality to
speech so one of the things that I do is
to teach about the mission that
machinery of the face and how that cut
ties in to the psychology behind the
face so I do a master class called
making faces I was just in Sweden
teaching this last week I’m going to
bring you through like two seconds of
what is otherwise a one-week course
we’re made up of muscles and our faces
in particular have dozens of these very
intricately interconnected muscles when
we smile as this character is doing here
we’re using two of those muscles on
either side of our cheeks are called
zygomatic major muscles but that is as a
basic smile kind of putting her into the
uncanny valley it’s a little bit creepy
in order for a smile to have the kind of
nuances that human beings have you’ve
got to add a bunch of other muscles for
example muscle controlling the pushing
on the lower eyelids of your eyes is
what gives a sense of that she’s
actually smiling through her whole head
not just her mouth you can add a bunch
of other muscles like going through
lower lip muscles now she’s grinning we
can add a little bit of upper lip muscle
now she’s kind of more genuinely
grinning we can add subtext to her face
through these other muscles like in her
neck and those sides of her jaw and then
you start to see that a simple smile can
become kind of a symphony of different
expressions depending on this nuanced
interaction of other muscles so with
speech you have about fifteen or sixteen
pairs of muscles that are interacting
with each other and in order to be able
to show that you have to have a
knowledge of anatomy in addition to a
knowledge of psychology what I call the
anatomy of emotion so here sorry I’m
going to go back back here so here we
are doing Reese
here beating the dynamic graphics
project at the University of Toronto on
this
it’s called jolly jaw and lip
integration and what this involves is
this very intricate what we’re doing
here is simulating some pretty intricate
connection between the way that we use
our lips and our jaws
okay so lips meaning facial muscles jaw
meaning that bone called the mandible
that goes up and down so I’ll show you a
little bit of the difference between
those two it should be a video coming up
right two seconds here okay what you’re
looking at is a traditional way of
animating that animators would take a
particular sound ah ah and so on those
are little bits there are called
phonemes and what you do if you’re
animating is you map those phonemes to
Visine that can work in basic like that
Homer Simpson kind of animation but if
you apply it to a realistic animation
it’s not going to work five different
ways of saying a phoneme called all
alright that is the way that we talk we
depending on our mood or psychology or
speaking style we’re going to be using
different variations in the red Nick
this guy about fish hunt got want to do
is using his jaw and not his lips this
person Robert Wilson Peer Gynt Jerry
Zacks the Caine Mutiny court-martial
she’s using her lips without really
using her jaw this is a kind of
two-dimensional field which we call the
Jolly filled jaw and lip in which in
that field you have different speaking
styles and if you’ve ever heard a
telemarketer on the phone you know that
that person is sounds artificially
cheerful and she sounds up or he or she
sounds artificially cheerful because
she’s smiling as he’s talking and when
you hear the voice without seeing the
person you can tell that now what if we
can glean that emotion or that speaking
state automatically if we can glean that
from only the signal of a person talking
then we can give animation to a face to
a character that’s got some actual
humanity to it so that’s what we do here
we’re taking
a character or CG character and we
adding that two-dimensional field that
you were seeing before and we’re moving
that around she’s seeing one phoneme ah
but she’s seeing at all these different
ways depending on where we put that what
we call the Jolly joystick okay
so this is how the sort of the
cornerstone of how we’re simulating
speech so well I’ll show you here it’s
kind of a real-time demonstration of
that in which a person will speak line
from The Merchant of Venice act 4 scene
1 by William Shakespeare recorded for
librivox.org by kirsten ferreri their
lines turn into text there you see it
there the text file is saved out along
with that audio file it’s read into Maya
computer animation program and then
there is a machine learning for what’s
called forced alignment done quality of
mercy from The Merchant of Venice act 4
scene 1 by William Shakespeare recorded
for librivox.org by kirsten ferreri in
that process I call it forced alignment
times the speech with the character and
with the actress and given the quality
of the the emotional tone in that audio
signal that carrot that character you
saw there can talk with a fair degree of
believability we think believability
some humanity and emotion and expression
and that is where we start to hopefully
bridge that uncanny valley and to give a
sense of real humanity even if the
character is not real so I’m going to
try a little test here we have a voice
actress of named Patrice Goodman and she
as recorded 30 seconds of dialogue and
you’re going to see three variations of
that head okay that you just saw before
speaking one of them is motion captures
performance capture where we’re
literally taking Patrice’s of facial
features and mapping them to a character
performance capture one is keyframe
animation the way that an animator from
Walt Disney Max Fleischer to today would
have done it and one of them is
automated jolly
speech lip-sync and I’d like you to put
up one finger if you think that the
automated one is the one on the left two
fingers if it’s the middle and three
fingers if it’s the one on the right
here goes my mom would never I even went
back east a couple of weeks before she
died just to get her to hug me and I sat
there for like three days and it was
very frustrating watching her second
guess that answers on Jeopardy while she
dumped her tea biscuits into her tea and
after about three days I couldn’t take
it I got up I walked over and I hugged
her okay so yeah one finger if you think
the automated ones on the left too if it
you think it’s the middle and three if
it’s on the right I see one two three
two one three there you guys are like
statistically all over the place and
when I see that statistically all over
the place it means that that for the
most part enlarged you guys don’t have
any idea which is which so yeah it turns
out it is the one on the right that is
that is completely automated speech so
it’s I mean it’s first time I’ve
actually tried doing this with earth
with an audience so that’s kind of good
and telling it’s a bit of a Turing test
a very small scaled version of a Turing
test here but yeah through machine
learned automated lip-synch were able to
get some degree of humanity in our
computer graphics characters and that is
a big endeavor of ours here at
University of Toronto to get past that
gap in the uncanny valley to bring those
characters to a place where we you can
have some degree of trust
it is about trust so yeah that’s what
I’ve got to say thank you very very much
you
you
Please follow and like us: