GOTO 2016 • Deep Learning - What is it and What It Can Do For You • Diogo Moitinho de Almeida

cool thank you I’m do go as a bit of
background I have a very math and
computer background which is very good
for deep learning list of achievements
on why you should listen to me is that I
currently work at a super cool company
that does uses deep learning to make
faster and more accurate medical
diagnosis and in the past lives I’ve one
whole lot of like international math
competitions and some programming
competitions as well and today I’m going
to chat about what deep learning is and
what it can do for you feel free to ask
questions at any time if I can ask that
right they can ask that right Simon cool
so yeah feel free to ask questions any
time just seal them out especially if
you think I’m lying to you so what is
deep learning gonna start with a
disclaimer deep learning is actually
pretty complicated it’s hard to be very
general about everything and be correct
so when in doubt I’m going to favor
generality so if you’re familiar with
the appointing already it’s going to
sound like I’m lying a lot but in
reality like this is just to like kind
of give the hot really high level of it
and also whenever possible going to
favor some of the shortcuts that might
not be a hundred percent correct but
should give the correct mental model of
how these things work but if you have
any questions feel free to ask so from
super high level there’s a very lace a
lot of like different levels of
hierarchy here in the ecosystem there’s
artificial intelligence which is a
superset of like everything an example
would be IBM Watson where lots of hand
coded rules and uses extremely large
amounts of expert manpower built to do a
specific task there’s machine learning
which is subset of that an example of
this would be like Google ad click
prediction and how you do this is rather
than using tons and tons of hard coded
rules you start using more examples to
figure out how to combine some hand
coded statistics to predict the
probability of for example an ad click
at a slightly deeper level you have
representation learning which is
sometimes seen as one layer deep
learning so sometimes referred like
these levels are called shallow learning
if you’re trying to start a fight
and an example of this would be netflix
movie recommendation where the
statistics of what you even know about
each movie is learned from data but
you’re still learning a simple
combination of how these features go
together and after a few levels you get
into deep learning so this would be an
example of figuring out diseases from
images where instead of you know having
a layer of manual statistics that are
learned and then combining together you
might learn all of these statistics at
the same time in tens hundreds or even
thousands of steps which is what some
people use nowadays this is probably not
a common view of what deep learning is
but I think the easiest view of how to
see it is deep line exists an interface
and this interface has roughly two
methods as the first method you have a
forward pass and this is definitely the
easy part given arbitrary input make
arbitrary output anyone can do this part
this is really easy the the trick that
makes it work is the backwards pass so
given a desired change in the output you
want to be able to transform this into a
desired change in the input and once you
have these you can make arbitrarily
complex things by chaining thing up
changing them up into a directed acyclic
graph and if this sounds too good to be
true especially with how we design the
forward pass because if you just say
arbitrary input an arbitrary output of
course you can do anything you want but
the hard part is how you define that
backwards pass because as you make your
forward pass more and more complicated
like if you have like some really crazy
function it becomes hard to define how
to map the inputs back into outputs so
by keeping these things simple and
combining them together it gives us this
almost composable language of modules
that allow us to do the things we want
to do so once you have this interface
you just can build up from this once you
once you have that you could have a
bunch of these modules that satisfy this
interface as a side note a bunch of
these modules will be parametric which
means that they have parameters which
roughly means that they’re stateful and
they’re stateful means that once you
have the state
state changes and it’s this change in
state that allows you to change this
function from something that you just
cobbled together to something that’s it
gets closer and closer to what you want
to do and once you have a frame rail
language of what you want to do now you
can start doing the tasks that you care
about you and deepen you always define a
loss or a cost depending on how you want
to define this and this is something you
want to minimize for reasons that I’d
happily explain this has always be a
scalar so it can’t be several costs the
same time you have to squash it down
into a single thing that you care about
and once you squish this number like
everything care about in the world into
a single number now you can start using
deep learning to optimize this you
create an architecture which is the
function that you want to do and this
would be how you compose together these
modules that I talked about so the way
you connect them together changes the
function that you have and the kind of
representation power it has and that
becomes the hard part after that you
initialize the parameters and you train
this architecture by repeatedly updating
the parameters to minimize the cost so
you go forward through the network to
get the things you care about and you go
backwards through the network to change
the parameters to be slightly better for
your costs and you repeat these many
times until you get a function that
you’re really really happy with and
solves whatever problem you want and at
the end you just use that function just
the forward pass how to implement the
backwards pass is in general we almost
always use the chain rule this is really
nice because it makes implementing the
backwards pass easy how this works is if
you have your output as sum function of
some X and you have the partial
derivative of your output dld f you can
get dl DX by simply multiplying the
partial derivative of DF DX and the nice
part about this is that dl DF is gotten
from the rest of your network and DF DX
is gotten just from your module so this
allows you to chain these things
together in a way that only requires
local information in order to get this
backward pass it’s very nice there’s
theoretical reasons of why this is a
good way to do this and perhaps the best
part about this is some frameworks make
us completely automatic by the
finding a forward pass using automatic
differentiation you can figure out how
to make a backward pass automatically so
it becomes basically as easy as defining
arbitrary functions as so you actually
do get this benefit of just define
arbitrary things return arbitrary things
as long as all the operations you do are
differentiable you can just make it work
like magic and optimize it and this is
literally how people do this in practice
updating the parameters these are just
minor details to like get an
understanding of how this works is that
once you have your existing parameters
you get your gradient and you take a
step in the opposite direction of the
gradient and the partial derivatives
tells us how to change the parameters to
increase or decrease the cost that we
care about an important word to note to
know though is and that people always
use I think kind of makes it more
complicated it’s a big word is back
propagation or back prop for short this
is has a longer name called reverse mode
automatic differentiation which sounds
pretty complicated but this is just the
chain rule plus dynamic programming I
assume that I just talked about change
were like some people are familiar with
dynamic programming but this is just
cashing and the idea would be when you
have a computation graph this is a very
simple computation graph y equals C
times the C equals a plus B D equals B
plus 1 the idea would be um you traverse
the graph from the top to the bottom and
by doing it from the top to the bottom
instead of the bottom to the top you can
cash the intermediate so that I used
many times in the graph and by cashing
these intermediates you get something
that’s much more efficient than if you
were going to do a naive solution and
this allows you to get gradients that
are computable in linear time in the
size of your graph so you basically
evaluate each node once and this is a
really nice property this makes it all
really efficient and that’s basically it
for the basics from a high level deep
learning is just composing optimized
abul subcomponents optimize the Ville
almost always means differentiable
differentiable means that you can do
backdrop backdrop is just the chain rule
in dynamic programming when once you get
to practical deep learning normally
you have to combine this with gradient
descent software and a data set but you
care about and the space of software is
there’s a very rich space of software
that will talk a little bit about in the
future but it this these things are
solved for you so you can do deep
learning without even knowing how to
calculate the gradient yourself so while
we can do arbitrarily complicated things
there are a few standard modules that
are the main workhorse of deep learning
today and the goal of this section is
going to be to get a high level
understanding of each since all of them
can be very incredibly nuanced but these
standard modules will cover almost all
of what’s happening in papers the
simplest of them is perhaps the simplest
is just matrix multiplication it has
many names the fully connected layer
sometimes shortened to FC sometimes
called dense because you have lots of
connections linear layer because it’s a
linear transformation or a fine because
sometimes there’s a bias and the matrix
multiplication is basically every time
you have a neural network diagram all of
these arrows correspond to the matrix
multiplication so when you have a
diagram that looks complicated that’s
from that’s from this kind of thing and
you can interpret this as a weight from
every input to every output so if you
have em inputs and you have n outputs
you have M by n way to the transform
your inputs to outputs and its
implementation is literally a matrix
multiplication and W in this case it’s
generally a parameter which means you
learn the connections from inputs to
outputs this on its own is not powerful
enough so you need at least one more
thing which is non linearity the
original non-linearity is called a
sigmoid it’s just this function it has
the nice property that it Maps reals
into the the space 0 1 and it can be
interpreted as a probability but that’s
not as important that’s just being
nonlinear and the reason the
non-linearity is important is if you
have this kind of like neural network
when you stack up the layers
back-to-back if you had no non-linearity
in the middle this would just be two
matrix multiplies back-to-back and what
would happen is you could just combine
this into a single matrix multiply so if
you have that 100 layer purely linear
network of just matrix multiplications
while this thing is pretty complicated
and you do all the work of a real neural
network you could actually flatten it
into a single weight matrix because of a
linear composition of linearity so this
was the original one people like it
because it’s very similar to what people
use before they really got how machine
learning words which was just binary
threshold injustice fired back in the
day and the cool part is with just those
two units you know how to make a neural
network you can just simply do you get
your input you apply the matrix multiply
you apply a sigmoid you apply another
matrix multiply and you have one and
these are called multi-layer perceptrons
when you only have matrix multiplies and
nonlinearities and the cool part is that
there’s a serum on this that this simple
architecture like literally three
functions can solve can approximate
arbitrary functions which means it can
solve any problem that you care about
there’s a cool theorem on this the idea
is that if you make the middle big
enough you can calculate basically any
function the downside is that just
because it can it doesn’t mean it will
and a single layer multi-layer
perceptron often causes more problems
than it solves so this is why there was
an AI winter in the 90s just because
these things are your kind of terrible
but people have gotten a lot better it
it’s and that now now neural networks
are cool and as a disclaimer these are
these are neural networks when you have
so this is would be a multi-layer
perceptron these are neural networks
everything I’m talking about today is
still a neural network but these are
specifically when you talk about the
multi-layer perceptron that is what this
is so since then people have made better
nonlinearities this is probably majority
of the improvement between 1990 and 2012
unfortunately which is you have like a
kind of smarter non-linearity so instead
of taking this weird squiggly function
you just do a threshold so anything
that’s negative you just turn it into 0
this is actually the most popular
non-linearity nowadays it does
incredibly well
some really nice optimization properties
in particular when you have zero like is
this not only is very linear so it works
very well with a chain rule and this
thing is used almost everywhere nowadays
especially in the middle of a neural
network there is a softmax which you can
think of as converting a bunch of
numbers into a discrete probability
distribution so the math of it is p
equals you exponentiate your input and
then you divide it by the sum of the
inputs you can think of the explanation
is turning it all the numbers into
positive and the dividing by the sum is
a normalization term there’s some very
nice properties about this and it’s used
as the final layer for classification
problems and it’s used in almost every
neural network cool that was the easy
part this gets complicated I like feel
free to ask questions during this I
normally explain this with a whiteboard
and it normally is complicated even with
a whiteboard but i’ll try to go through
this so a convolution is a the main
workhorse for deep learning on images
and deep learning and images is
basically it’s kind of where this
revolution started so it’s very very
important it’s probably the place where
deep learning is the most advanced so
it’s a very important primitive and I
think the very cool primitive to
understand because you really realize
like how beautiful the framework is when
you see like wow this thing sounds
pretty complicated but it you can just
plug it in and you’re doing that need to
know how it works when someone has coded
up for you which is what I do so this is
a linear operation for 2d images so once
you have a multi-layer perceptron you
have a mapping from every input to every
output but in the case of images your
inputs are structured so you have like
this spatial relationship between your
inputs and if you have a mapping from
every input every output you kind of
throw away the spatial relationship so
the idea would be what if rather than
having a connection from every input to
every output what if every output the
output look like an image as well and
every output was only locally connected
to the things that corresponds to so
that is in sight number one so local
connections and inside number two is
every output is
a local function of its input what if
instead of having every output be its
own function which would be the general
case what if every output was the same
function of its input so what this then
becomes is equivalent to like a
well-known function in computer vision
which is a convolution which is you have
a kernel which is you can think of it
like a local weight matrix so it’s
represented yeah oh cool my mouse is
here so it’s often represented as the
square in an image like thing which
means that they’re capturing that local
input you just do a matrix multiply
between all of the weights of the kernel
which would be something like this you
do that multiply for everything the
local region you sum up the results so
this is just a dot product and then you
do that at every single location at the
input so it’s kind of like tiling your
input with the same function or it can
be interpreted extracting the same
features at every location which is the
more common way to interpret it this is
it’s very powerful it’s very parameter
efficient because you have a lot of
weight sharing between the parameters
and you can end up having much larger
outputs then you can have with a normal
matrix multiplication and you also don’t
lose spatial information which is a very
important structure of images so these
are some really nice properties and as a
side effect you might think that this
thing is really complicated how do I
take a gradient of it because I maybe
the whole thing is kind of complicated
but this is actually equivalent to a
very constrained matrix multiplication
so if you take your input image and you
unroll it because with a matrix moulton
you lose that spatial structure and you
unroll your input you basically have a
few connection like every input every so
every output is connected to maybe like
nine of your inputs and that just
becomes equivalent to the really like
all of the diagram with lots of arrows
but most of the arrows being zero or
missing so this is still completely
differentiable and still fits very
nicely into this framework that you can
plug in with all the other
nonlinearities cool it’s going to get a
little bit harder
another very fundamental building block
is called a recurrent neural network I
don’t know why the building block is
called a network when everything else is
called layer but that’s just kind of
convention and this is solving a a
problem that is basically has not been
solved in machine learning before which
is we want functions to take to take in
variable size input but they can only
take in fixed size input and this is
becomes a problem when you’re a function
is parametric like a fully connected
layer is because if you want a
connection from every input to every
output but your input size changes that
means you’re the number of weights you
have changes and that means that if you
get a longer example at the inference
time you now don’t know what to do with
it and this also might be inefficient
because you might have like really
really big a really large number of
inputs and you might not need all of the
power of having like every connection
there so a recurrent neural network is a
way to solve this problem and the
solution to this problem is recursion so
what you have is an initial state which
would just be let’s just call it h in
this example and you have a bunch of
these inputs X and there’s a variable
number of them so you don’t really know
what like this capital T is and you can
make a function that takes in a fixed
size and because each X is fixed size
you can make that function have taken
both h + X and now you can recurse
through this list by saying h of t
equals the function of the previous
state sorry the new state is a function
of the previous state and the current
input and then you just return the final
one and what this allows you to do is it
allows you to with a fixed size input
you can have it operate on sorry with a
fixed function that takes fixed size
input you can now turn it into a
function that takes in a variable sized
input by applying that function of
variable number of times this is not the
this is like a pretty obvious insight
and you could do that with any kind of
machine learning algorithm you could
like apply a random forest like an
arbitrary number of times but the cool
part about this is that because this
function is differentiable this
recursive function is also
differentiable so you can take the
derivatives of each of the inputs you
can take even take the derivative of the
weight matrices you use at each step you
can use that f reach up and you get a
diagram
it looks kind of like this now you can
think of it as applying an FC layer for
each input that takes the input and the
state so far and this diagram might not
be very clear but there are many
different diagrams for RN ends and
they’re all equally confusing if you’re
unfamiliar with them so this is kind of
the one on the left is my favorite one
because you can kind of think of it as a
stateful function except you the state
only lasts for the duration of your
input but the unrolled version is the
version you use if you’re taking
gradients so this is equivalent just
passing the gradients through this very
long graph a last complicated slide long
short term memory units this puts me in
a really hard position because I can’t
not talk about them because they are so
big it is but they’re also extremely
complicated and there they take more
building blocks and I’ve UNIX even
explain but there is this great blog
post I think slides will be published so
you don’t have to worry about that this
great great blog post tries to explain
it but I’m going to try to give like a
high-level intuition of them just so
like even higher than what I’ve said so
far um just so that you can kind of
understand where where it’s coming from
when I talk about these things being
used and the idea would be its kind of
like an RNN and in practice no one uses
the RNN that I’ve just described it’s a
very simple function and there’s much
more complicated versions it’s an RN n
where the function is just really
complicated so this entire thing here is
a representation of that function I’m
not going to get into the details of it
but it involves a lot of different
mechanisms in order to make optimization
easier and the idea is that if you’d
apply if you designed this function well
the function is applied at each time
step it can make the problem much much
easier to optimize and you can have like
a much much more powerful function and
the key is that by having a path which
is relatively simple so this is what
represents with the top path where these
variable operations being done to it it
makes it easier to stack these things
want back to back and that makes easier
to learn long-term relationships between
the functions
whoo okay that was that was the
complicated part you now know
ninety-five percent of the building
blocks that everyone uses for
state-of-the-art deep learning with just
these billing box you could probably do
new state-of-the-art things on new
domains so congratulations you ready for
the next part um so in this part I want
to talk about what D planning is really
good at and what you should use it on
the answer is a whole lot so I’m going
to cover just the rough themes of where
deep learning really shines but there’s
just much much more to it which i think
is part of the awesomeness because it’s
all falls under this extremely simple
framework that I’ve just described I
don’t think that you could like describe
any framework as simple as what I’ve
just done and have it solved this many
complicated unsolved tasks before 2012
basically so convolutional neural
networks this is a general architecture
commonly referred to as CNN’s this
actually means a network in this case
and not just a layer the idea is that
you take your image you apply
convolution you apply your value your
rectified linear unit you’re probably
convolution you apply lu and you
basically repeat this convo you until
you solve all the problems in computer
vision that isn’t quite true since at
the end you need to tack on some sort of
outfit layer and the other player
depends and what kind of input you’re
trying to solve the cannot like a really
old school task is that you its face
recognition trying to determine like
whose face this is and this is a really
cool task because makes the
representations very visual and you can
see how the network learns over time so
at the first layer you when you start
with the pixels at the first layer your
filters tend to just match for edges and
very simple things so convolutions can
match edges and other very simple shapes
and as you get deeper and deeper into
network you learn more complicated
functions of the input so after that you
can start combining edges into corners
or blobs so this is still extremely
simple but after you get to another
layer somehow like combining two corners
the right way becomes kind of like an
eye like shape or if you have like two
corners in a blob that becomes more I
like and you can build up from
edges to corners to object parts and
eventually into the objects you care
about and as you get really really deep
networks you actually have intermediates
that are extremely semantic objects for
example people have made a lot of tools
for visualization of neural networks
where they visualize what these in with
the neural networks learn and you have
for example if you have a neural network
that doesn’t learn to classify books at
all but lanes classified bookshelves
some of the intermediate features
actually become book classifiers which
is really interesting like it can learn
or I like a hierarchical representation
of your input space such that these are
useful things to combine together in
order to make a robust classifier and by
combining so maybe if you combine like
three books together as well as a square
this becomes a bookshelf so these are
kind of like what the local operations
do with each neural network and the
beauty of it is that it’s all learned
automatically for you don’t need to
program like I have a book shelf
bookshelf normally have books they have
books sorry they have like square stuff
maybe they’re often decide flowers this
all can like happen in a data set
automatically for you and these
convolutional neural networks are
absolutely amazing they just when I
wasn’t joking when they save basically
all of computer vision right now it all
started with imagenet this was in 2012
this is when deep learning actually the
entire hype train started where you had
traditional machine learning solving
this very hard very large computer
vision data set and it was kind of
plateauing over the years and all of a
sudden deep learning comes in and it
just blows everything away and ever
since then everything has been
everything in computer vision has been
deep learning like nothing can even
compare and recently we’ve been even
being able to get superhuman results
which is pretty impressive because
humans are pretty good at seeing things
it’s kind of what we’ve evolved to do
and the same architectures can do all
sorts of really interesting structured
tasks so using almost the same
architecture you can use a concept to
determine you know like you can break up
your input space into a what’s called a
semantic segmentation of like all of the
relevant parts that you
have and using basically the same
architecture as well you can do crazy
things like super resolution where you
takin like a low-resolution image and
make it you can fill in the details
which is pretty is that’s a pretty not
only is it incredible even though it
sounds pretty easy it’s incredible that
like that you can use the same
architecture that takes an image and
tells you whether or not there’s a dog
in it to take an image and return like a
new higher resolution image and this is
basically the same library the same
components it’s just very very
composable and that’s really good
awesome you can also use this to solve
really hard medical tasks tasks that
people could not solve before here we’re
detecting classifying lung cancer in CT
scans these are the kinds of things that
I like to work on and it’s not only
limited division there’s been a lot of
work in language understanding so is
something that deep learning is really
good at this language modeling roughly
this means how probable is a how much
sense this statement-making a certain
language so it might have to do with a
question response how are you I’m fine
it might have other things such as what
would be a weird thing my laptop is
squishy might be a very improbable
sentence to say so a neural network
could probably determine squishy is a
very bad adjective for a laptop this is
a very improbable sentence but if I said
my laptop is hot that would probably be
a much more likely sentence and this
already has some human-like seal to it
because language was designed for humans
and being able to have like if you can
do language understanding as in
determining the probability of like any
sentence given a context you can and if
you do this perfectly you can solve
basically any task and this is a it’s
really interesting domain where it’s
being applied because previous if you
look at what how language understanding
was done before deep learning was around
it was just incredibly simplistic tons
and tons of rules no robustness two data
sets you’d have to make custom rules for
every language and now you could you can
use the same tricks for English as you
can
for Chinese characters as you can for
byte code so that is just pretty
incredible they’ve obviously been much
more complicated tasks a pretty popular
use for machine for deep learning that
this people are really putting a lot of
effort in is aunt went language
understanding from scratch so the idea
is you use an RN to compress a sentence
in your source language into a vector
like I described in the RNN section and
then you use a different RNN to decode
it into a target language and while it’s
not surprising that you can design a
neural network that plausibly can output
this it is quite surprising that it
works so well and you’ve been able to
have neural networks that in the man in
a span of a few grad student months
match the performance of systems that
people have spent decades engineering
and / happened nowadays I think that
deep learning systems are not into and
deploying systems are not what’s used
for this right now but they’re a very
important component so people still use
a bit of hard coded stuff but it’s only
a matter of time and the beauty is that
but if we have a new task or a new
language now it can just automatically
work like what if we you know we find
out some lost language from a thousand
years ago and we have like a good amount
of their texts can we actually learn how
to translate it or understand it without
any knowledge of this and it seems like
purely from data we can and that’s
really cool we don’t need an
understanding of something in order to
we don’t need a understanding prior to
applying our machine learning models in
order to have an understanding
afterwards and that is just really
really awesome I’ve actually been
chatting with the people at SETI the
search for extraterrestrial intelligence
and one of the tasks that they’re doing
is trying to understand dolphins the
rationale is that if we can dolphins of
language aliens might have language if
we if we see alien communication we
probably won’t understand it
perhaps we can use dolphins to the proxy
for aliens to try and understand them so
there’s some really cool tasks that are
happening there it’s not limited to that
there’s some really cool things being
done with art in deep learning actually
I think that companies have started up
that their entire business model is
creating awesome deep learning art and
they seem to be doing well from what
I’ve heard in this case this is a
hallucination purely from a conf lap
trained to do image classification so an
image that continent you know something
that takes an image tells you like what
breed of dog it is with objects or in it
you can use it with a few tricks to
create this kind of crazy art and this
was a pretty big splash it’s a very
unintuitive that a neural network that
isn’t even made trained to make art
actually can turn out making this kind
of thing they’ve been more popular use
cases such as style transfer the idea
would be you can take a neural network
still train for classification the idea
would be classification has some priors
about what images some priors about the
natural world so the what you do then is
you say i want my image to kind of match
the distribution from a different image
and then you get this kind of style
transfer where you can mix together
these kinds of components and while this
is actually pretty ugly example there’s
there’s some good ones i promise there’s
some much more complicated things you
can do it’s not just like taking two
images together and merging them
together you can do things like
transforming a perhaps not super great
drawing something that you could
probably do in paint fairly quickly into
something that looks like an artist did
or something that’s really awesome and
the idea would be that you can actually
take these arbitrary doodles and convert
them into these things that look like
paintings and this kind of stuff is
really awesome and I think it’s just the
beginning of the kind of stuff that we
can do with neural network art but after
basically less than a year of work on
this you’re making applications that are
already very tangible very awesome very
this is already something that if I made
this I would probably hang up in my
living room and this has only been one
year of work imagine what happens that
in 10 years I saved the best for last in
terms of art we can combine our pictures
without of Pokemon so clearly the future
is here um this is one of my crowning
achievements I think primarily because
I’ve done this with like dozens of
people and only mine turned out well but
yeah the I think this is really awesome
there’s like just so many things to do
here and so few people are working it on
it and that the sky is really the limit
so it’s just really exciting what on the
kinds of stuff that we can be created
here there’s been other huge achievement
game playing has been really big if
anyone saw deep mines 500 million dollar
acquisition in 2013 roughly the only
paper that they had at the time was
learning to play Atari games from pixels
which is might be harder than it sounds
because humans have a prior of how to
play the game right like they have a
prior that this is maybe a ball and
that’s a paddle and I want to destroy
certain things where they were prior
that a key opens doors or that roads are
something I want to stay on in a driving
game but neural networks not given any
of these priors it’s literally only
given the pixels given these images it
learns to play at what is on median a
superhuman level and the techniques have
been continuing to get better and this
kind of stuff very similar tricks have
been applied to the much more recent
result of google deepmind alfa GO
network which was not that huge of a
deal in the West but if you ever talk to
people from the more Eastern world you
can talk to them about here are the
achievements of deep learning you talk
about smart inbox and they’re like oh
that’s pretty okay you talk about image
search yeah that’s pretty okay and then
you talk tell them about like oh yeah
it’d also be the world champion it go
and they’re like whoa we’d beat plays go
that’s amazing and people predicted that
even beating human
go would probably be depending on the
expert 10 to 100 years off and it
happened it just happened it’s already
done it’s already that like humans have
lost that go and as a side effect goez
also caused more fear over AI safety
than any other neural network I believe
and this is probably a good
representation of that I don’t know how
yes let’s medium clear this is an XKCD
of like how hard people used to think
these games were and you can see go as
basically being the last on the level of
computer still lose to top humans and
then not all of these are solved but
that is just pretty incredible that
that’s now solved people have been
trying to ask like if it can do this
what can’t it do because go is a task
that requires a lot of reasoning and
these kinds of achievements have been
being transferred into the physical
world as well this is a google has like
a farm with like a bunch of robots that
have learned on their own to grasp
objects and basically robotics control
is usually pretty hard especially when
you’re trying to make it generalize and
they’ve been able to do that just by you
know throwing the robots into a dark
warehouse having a train for a while
designing a cute objective function and
it just learned to grasp things better
than their hand design controllers did
which was pretty awesome and more
recently actually I think there was a
video like that came out last week of
nvidia using just deep learning for
self-driving cars so the idea was like
with just a single camera in front of
your car now your car can learn to drive
can can drive itself from learning from
how other people drove and this is a
very interesting result because even
google has been working for i don’t know
if it might have been a decade already
that they’ve been working on
self-driving cars using you know lidar
and slam and all of that stuff and
Nvidia’s by some measures caught up to
them entirely within i think it’s been
less than a year since they’ve been
investing in this so a lot of thing it
seems to be changing a lot of things
especially these kinds of perception
tasks because research is moving so fast
I also have to spend some time and
things that are not yet practical but
may very well soon be as a disclaimer
I’ve been traveling this weekend so i’m
not sure if some of these things belong
in the already solved category
generation is a big one there’s tons and
tons of stuff happening generation so i
definitely can’t give it justice there’s
really cool stuff and like just
generating images from scratch and
generating arbitrary other domains from
scratch images are just the most visual
so i have them here but some of the
coolest and perhaps most practical
examples are conditional generation
something i’m really excited about is
image to text so the idea is you’ve
taken an input image and the output is
not like yes or no whether or not the
dogs engine but you output a description
of the image and that’s like an
extremely human task it to be extremely
useful if you do this task right it
seems like this all the whole ton of
possibilities I’m very excited about
like taking in a medical image and like
outputting like a pleat report of it
which would be really awesome and some
people that are really excited about
this that has applications in the very
short term is I don’t know the right way
to say it but like the poor eyesight
community so web pages nowadays have
been pretty bad about stuff for people
with disabilities and imagine if you had
a neural network that can just describe
an image for you describe a page for you
tell you what’s on the page in a very
semantic summarized way and there’s also
a really cool opposite problem which is
instead of taking an image and
outputting a description you take in a
description and not put an image which
as a terrible idea artist I’m probably a
bit more excited about because instead
of like I can describe pictures I can’t
really draw them and like these are much
better already than I can draw but
that’s probably a low bar but in this
kind of network you actually take in
like a sentences text and all of these
images are generated from that network
and that’s pretty incredible some of
them are not super great but like these
birds are actually there’s I believe
they’re real the flower is not the
purple ones
but they actually see him close like if
I if it was zoomed out enough I could
see this is being pretty real and can
you imagine in a future where instead of
having to spend millions of dollars in a
movie you just like type it up and then
a neural network just generates the
movie for you we’re quite away from that
but perhaps not that far away especially
like with some focused work and this
could name like all sorts of like new
forms of creativity that people don’t
even know about while language
understanding does quite well there is a
deeper language understanding which we
can kind of solvent oi tasks but it’s
kind of harder for real task so QA so
cute question answering that requires
more complicated reasoning such as if
you have like a story here and you ask
something question complicated like
where is the football then yes like go
back in the store and figure out where
that kind of thing happened that thing’s
kind of complicated people are very good
at this task models can solve these
simple ones quite well but they can’t
real do real question answering yet
which is unfortunate but something
people really care about and we’re not
quite there yet but I also love how
awesome this problem sounds is that like
our machines that we have like basically
spent no work on only automatically
learn a shallow level of reasoning like
that’s like such a first real problem
while there’s like language
understanding there’s also visual
understanding that is um kind of
unsolved there is a there’s some awesome
data set that involves images and
questions and the goal is to find an
answer and the models there are models
that can do pretty okay at this task but
still very not good and still like face
significantly worse than people do so
this kind of thing is something that we
can’t do just yet while game playing is
solved harder game playing is still an
open problem and you might think harder
game playing my five-year-old brother
can play minecraft and he almost
certainly can’t beat the world champion
at go
the harder in this case means stateful
it turns out that humans are really good
at remembering something while neural
networks have some difficulty with it so
the neural networks that people have
been using for playing games have been
completely stateless so when you have a
partially observed world like Minecraft
where you like only have one direction
that you’re looking at if you like look
to the left it forgets what was on the
right and this is something that people
are still working to solve it’s the same
thing with doom and the work has been
done a bit but it’s far from being a
solved problem and I do believe it
they’re still subhuman at this task
there’s some really cool stuff with
automatically discovering hierarchical
structure so in language the
hierarchical structures may be clear to
us because we use language like
character a word limit of character your
senses are made of words paragraphs are
made of senses this is like this
semantic hierarchy which makes it easy
to break down a problem into simpler
problems but this is not the case in
many domains and there have been people
who’ve designed neural networks that can
actually automatically discover this
hierarchy and this could be really
useful for tasks where we don’t know how
to interpret that so something I’ve
worked a bit on is genomics and we
really don’t even know how to read
genomics right but if a neural network
and automatically break it up into like
this part goes together with that part
you know there’s connections between
here and here this could actually help a
whole lot with all sorts of different
kinds of scientific tasks just purely
from data this is when it gets a little
bit computery but these are things that
I’m excited as a computer scientist
there’s this model called neural turing
machines which learns to use like a big
memory buffer which is very cool so you
can actually see like how the network
reads and writes and reads in order to
copy an input there’s ways to implement
differentiable data structures so things
that you thought where instead of having
like this black box of like arbitrage
activations with matrix multiplies you
can actually plug in a data structure
into a network and now your network can
learn to do things like pushing and
popping to a stack you know getting from
both ends of a queue and all of these
kinds of things and this could
potentially enable all sorts of very
cool use cases
as learning to program people have done
some work where you can create models
that not only can like have simple input
output mappings but as an intermediate
in this input output mapping they can
learn subroutines and play with pointers
and this actually makes them a very very
general computing like it potentially
could do all the problems we care about
if you can learn subroutines and play
with pointers it’s like that could learn
abstraction automatically for you and by
putting these things together people
have been able to do things like
learning to actually execute code so the
idea would be given like code is a
string and targets for that code like
what the output is you can actually
learn an interpreter for that language
and this is really exciting to me as a
programming language guy like maybe I
could design a programming language not
by implementing it but by just showing a
whole bunch of examples and the
implementation automatically happening
for me or perhaps I could just write the
test cases for the language and a neural
network can generate an efficient
language for me and something else that
is related to all of this stuff is this
is really early but I think a lot of
people are really excited about that
which are neural module networks we’re
instead of having a single architecture
that you play with you can have
architectures that are you can have a
library of components and that F for
every single example you make a custom
architecture and you output it so for
example if you have the question
answering task and you have an image and
you have a question where is the dog
instead of using an arbitrary network
that takes in the question and the
answer you actually convert this
question into a custom neural network
which combines a dog module with a where
module and outputs the answer and this
kind of thing is very early but really
promising so that that’s it for the
future of it I hopefully you guys are
pumped to deep learning some problems
there’s a lot of software to help you
I’m not going to talk about that right
now because there’s a lot of tutorials
out there and I think the high level
understanding is much more important my
recommendation is that if you want to
customize a lot of things
theano intensive floor the best because
it allows you to get this automatic
differentiation that I was talking about
then you never have to worry about the
backwards fast basically and if you want
to just use like the modules that I
talked about as well as a few others
Karis can solve that and you can do a
lot of these things with Karis if you
want to do this there’s a lot more
learning to do and the devil’s really in
the details so I was super high level
with lots of the stuff but all like
there’s so many little things that you
need to know such as how do you take how
you perform the updates in a way that
doesn’t cause your parameters to grow
too large how do you initialize the
parameter is to not be a trivial
function how do you not over fit your
training set so there’s a lot of
resources out there my favorite one is
this stanford class by Andre Carpathia
cs2 31 n it is specifically incontinence
but it’s constantly updated with
state-of-the-art stuff and it’s
generally very high quality so I think
it’s very approachable for anyone like
beginner to very advanced and if you
want to do this you probably need GPU or
50 I think that’s it for time so sorry I
was rushing at the end with any
questions also I have these slides which
is slide should I leave it on some
questions here go so one is how can we
avoid that autonomous cars pick up the
human bad habits how can we avoid that
autonomous cars pick up human bad habits
that is a very interesting question it’s
very dependent on how the cars are
trained so if you train a car to copy
the UN bad habits so if you train a car
to copy humans which is by far the
easiest thing to do it’s not the most
correct thing to do because the most
correct thing to do would be to learn
how to drive optimally from scratch that
unfortunately involves trial and error
but you probably don’t want that in
self-driving cars so we can skip that or
hard coded rules so what content happen
is if you’re training it to learn from
humans it’ll mimic those humans but the
idea is that if humans make mistakes you
like let’s hope let’s say you want to
make mistakes and let’s say humans don’t
make consistent mistakes if they don’t
make consistent mistakes and different
humans make different kinds of mistakes
are the same human-like only makes it
makes mistakes sometimes and you have a
neural network that neural network can
predict the expectation of what the
human can do rather than the worst-case
scenario so if you’re kind you can think
of humans as an in Samba in this case
that if you’re predicting what the
average of a bunch of humans can do you
can drive better than a human can but if
humans are consistently make if humans
consistently make mistakes then there’s
nothing you can do about that other than
get more data I think we have time for
one more what do you think about chat
BOTS is it possible to build only with
deep learning yes there’s actually many
startups that are doing this right now
so this seems to be the next wave in
startups are like the hot thing right
now where people are trying to use chat
BOTS to do all sorts of things for like
very specific domains it has some really
nice properties from a business point of
view because your goal is to replace
humans who chap so it’s very easy to
like replace them with an algorithm
because the humans when you have if you
have a bunch of them they generate a
bunch of data so it is very plausible
it’s still hard for chat BOTS it’s kind
of like the game-playing problem where
it’s hard for chat BOTS to have a memory
of what you said so if you talk about
like Oh try you know opening this menu
and you know go here and here and here
and you could like might have five
sentences later the chat BOTS might say
the same thing because the neuron lyrics
i still have memory issues cool I think
that’s it so please remember to vote and
let’s yeah give a big applause to do
thank you

Please follow and like us:

GOTO 2016 • Deep Learning – What is it and What It Can Do For You • Diogo Moitinho de Almeida

Be First to Comment

Leave a Reply Cancel reply