GOTO 2017 • Machine Learning with TensorFlow and Google Cloud • Vijay Reddy

[Music]
so my name is Vijay I’m a machine
learning specialist I work for Google
hope you’re all having a good day today
so far how many people saw the last talk
in this room the meaning of life talk
okay good and that’s actually a good
natural segue into this talk because
that talk was kind of the grand the
grand vision of of artificial
intelligence some of the big milestones
and this talk is going to be very much
practical knowledge if you and the
audience want to build a machine
learning model how do you do that so
credit to the conference organizers for
planning it like that how many people
have built a machine learning model
before okay and how many people have
specifically used or at least played
around with tensorflow before okay good
so the flow of this talk is going to be
the first 15 minutes or so are gonna be
slides and then we’re gonna spend the
bulk of the time actually building a
machine learning model end to end I’ll
actually show you the code execute the
code this will be the largest screen
that I’ve ever run live code on so
hopefully it doesn’t go too badly so
when I say end-to-end machine learning
pipeline this is this is what I mean so
you start with the raw data set you do
some pre-processing on that data set
then you train your model you deploy
that model to some place in this case
we’re going to deploy it to the cloud
and you run some predictions against
that model and specifically the problem
that we’re going to work on is
predicting housing prices so we’ll
actually start with the raw CSV file and
then go through the whole set of
instructions there so before we get into
that just a couple slides about machine
learning within Google so most of you
probably know that machine learning is
pretty central to what we do as a
company we are ultimately a data company
and so this is a chart that shows the
growth of machine learning models in
production in Google over time so you
can see even at Google back in 2012 this
was very new for us and fast forward to
today it’s we have over 4,000 machine
learning models in production and it’s
gotten to the point where now within
Google machine learning has become an
expected skill set for every developer
to have so you don’t have a special set
of data scientists over here and then
your web developers over here machine
learning knowledge is becoming as common
is knowing Java or Python within Google
and we offer training courses and
programs for every software engineer
within Google to get competency within
within this field so these are some of
the products that we use machine
learning in it’s it’s in almost every
one of our consumer facing products so
it’s a big part of the ranking in our
search algorithm using an algorithm
called rankbrain Google image search
relies on machine learning Gmail spam
classification there’s a new feature in
Gmail that some of you may have seen
called Smart reply where it actually
tries to predict an appropriate response
to your email so you don’t have to type
it manually which is quite useful so
then when we talk about Google cloud
platform that represents the
externalization of all of this internal
work we’ve done on our own products so
we have very large research teams in
last year alone we published over 300
research papers in top tier academic
journals about machine learning and what
we’ve done is we’ve taken some of these
models and made them available to the
public so all of that forms falls into
the Google cloud platform umbrella and
you can form a division within that on
the left side you have your custom
machine learning models and on the right
side you have your pre trained machine
learning models so most of this talk is
going to be about how do you build a
custom machine learning model
but I do want to touch just a little bit
on the pre-training machine learning
model so you know what they are and how
to use them and when you would use them
so these machine learning models
typically as a general principle you
always want to start with the simplest
solution and add complexity as necessary
so there are some problems that you can
solve without building a model at all
and if you can do that then that’s what
you would want to do because there’s
much less engineering effort then you
don’t have to reinvent the wheel and
these are all very low barrier to entry
is you can actually trial these you
don’t even need a Google cloud account
you just go to your browser go to the
respective page so this example here is
our vision API and you can just drag in
an image and it’ll show you exactly what
it’ll it will return so in this case
it’s giving me a bunch of labels about
this image with the confidence score for
each one of these so for example it’s 85
percent confidence that this is a
suspension bridge there’s some other
features here there’s a landmarks
feature that actually recognizes certain
landmarks so it knows that this is
specifically the Golden Gate Bridge in
San Francisco and it gives me the
lat/long coordinates and when you’re
calling it over the rest api you’ll get
your results in this JSON format and you
can integrate it into your application
as you see fit
so the best way to think about using
these pre-trained api’s is as lego
blocks it’s very unlikely that a single
pre-trained model is gonna solve your
specific problem entirely by itself
because by the nature of a pre-trained
model they’re trained on google’s data
it’s trying to be a one-size-fits-all
solution so we train on massive data
sets but it’s very it’s a very
generalized data set and in your
applications you’re working on very
specific problems so
here’s an example of a customer out of
Japan that built a platform to auction
off cars and they wanted to add a
functionality where if they uploaded an
image of a car they would have an API
that would return certain information
about that car so it would tell you what
is the make of the car what is the model
of the car if it’s a specific part of
the car what part of it is is it the
tires the steering wheel and also what
side of the car are you looking at are
you looking at the front or the left or
the right of the car and the reason that
this was important is because that’s how
they categorized these images for their
consumers to look at on the website so
before machine learning they had actual
human to go in and tag each of these
images and this is the left side of a
Ford Focus etc very time-consuming and
they wanted to try to use machine
learning to automate that and what they
did at first was a common mistake which
is try to build a single model that
solves that entire problem that spits
out all of the information that they
care about from a single model and they
invested some significant engineering
effort and eventually gave up because
the model just wasn’t giving the results
that they needed then they took a step
back and said what if we break this into
multiple pieces so they they said first
let’s use Google’s pre train models and
see how far that gives us gets us so
they ran their images through our vision
API which is zero engineering work on
their part because we’ve already done
the model and exposed it and they’ve
noticed that’s all part of the problem
it tells you what is the make of a car
but not what is a specific model and it
also can identify what part of a car it
is is this a tire or is this a steering
wheel but what it could not do is for
example tell you is the left side or the
right side of a car because it just
wasn’t trained to do that so now when
you narrow the scope of the problem to
just that it’s actually quite easy to
build a custom model that does that
using a framework like tensorflow
you get a thousand images of left sides
of cars and a thousand images of right
sides of cars different
types of cars and you put them through a
neural network and you get a pretty
robust algorithm that can now tell the
difference between the left and right
side of a car and that solves part of
the problem so they did this in a
piecemeal approach and then aggregated
the information from each of these
individual models these small models to
solve the total business problem and
they ended up getting where they want it
to be but in a much faster and cleaner
way so that’s really how you should
think about these problems there’s a lot
of times you can make them modular and a
series of smaller simpler models can
aggregate to solve what you’re trying to
solve okay so that’s it for pre-trained
models now let’s say you do need to
build a custom machine learning model
you have narrow the scope of it as much
as possible to use things that were
already built so you’re not reinventing
the wheel but usually there’s at least
some part that you have to build custom
and when I say custom that means you’re
training on your own data you’re writing
your own code and the framework that we
at Google internally used for this type
of thing is called tension flow so
tensor flow is something that probably
most of you have heard of it seemed like
about a quarter to a third of the room
has actually already used so first of
all what is tensor flow tensor flow is a
machine learning learning framework it
was open source by Google in November of
2015 and since it has been open source
it’s become the most popular machine
learning framework and I’ll talk about
some reasons to why it may have become
so popular so one is it’s worth
mentioning that the popularity of
tensorflow itself is a big reason to use
it because with popularity becomes
developer ecosystem so if you’re working
on a problem chances are someone has
already solved a problem similar to it
using tensor flow and you could do a
search and you’ll find a blog post if
not a official tensor flow tutorial
showing you a code template and you can
use that to bootstrap your own project
from so that’s a very valuable and the
second thing there is
if you get stuck or you run into a bug
when you post it on to something like
Stack Overflow because there’s such a
large community around it you’re
typically going to get a response faster
than if you’re using a lesser used
framework so the popularity itself is a
big asset at this point another reason
that tensorflow is popular is this idea
that you can program it at various
levels of abstraction so if you don’t
need a lot of control over your
algorithm if you’re not a researcher
that’s that’s tweaking your algorithm
down to a very fine level of granularity
then you can abstract yourself away from
that using a higher-level API like
Kerris which some of you may have heard
of or like the estimator API which is
actually the code that I’ll be showing
later on in the talk and this allows you
to do things like just use a quote
unquote canned model that’s pre
implemented for you it already has the
training loop built in already has the
evaluation loop built in and with just a
couple lines of code you call that and
it saves you a lot of work but if you do
need additional flexibility you can drop
down a level in the hierarchy to these
lower levels ultimately down to the the
base vanilla tensorflow level where
you’re explicitly defining your matrices
and doing your matrix multiplications at
that point another reason to enter flow
is popular is because it’s
production-ready
so the reason Google created tensorflow
in the first place is because
historically the deep learning
frameworks that existed at the time all
came out of academic institutions so MX
net came out of Carnegie Mellon dno came
out of the University of Montreal Cafe
came out of the University of Berkeley
and if you think of the needs of an
academic researcher is very different
from the needs of someone who’s in
industry
so a researcher their motivation is
essentially to prototype something so
that they can prove out a theory publish
that theory in a academic journal and
then move on to something else so
there’s no real thought about how do i
scale this model how many users are
going to hit it what are my security
considerations and things like that so
when Google start got into this machine
learning space knowing that everything
they built was going to be used by
millions of users they had to create a
tool to solve this problem that didn’t
exist before so all of these things like
serving in latency are first class
citizens and tensorflow
and they’re built into the tool so it’s
very performant and you can also deploy
it across different types of hardware so
you write your code once and then you
can deploy it on a CPU or GPU or on a
mobile phone Android or iOS or even on
embedded devices like Raspberry Pi and
of course algorithms so the main reason
that you use a machine learning
framework is so that you don’t have to
re-implement algorithms that are already
well established and well proven in
industry so this is an example of the
tensor flow object detection API
it is a algorithm that identifies labels
in an image and not only does that but
it draws bounding boxes around those
labels so this is a much harder problem
than the the original type of computer
vision problem which was image
classification that would tell me that
there’s a kite in the image somewhere
but it wouldn’t tell me where within the
image that kite was or it wouldn’t tell
me how many kites are in that image so
this is a algorithm that came out of
Google research and then we open sourced
the results of this via tensor flow so
now you can actually leverage this
algorithm with a few lines of code and
if you just do a Google search for ten
to flow object detection API you’ll see
a blog post that tells you how to do
that
and here is another example this this
idea of a wide and deep algorithm is no
longer really considered
state-of-the-art but when it came out
tensorflow was the phrase first popular
framework to kind of distribute this or
democratize this type of modeling to the
quote unquote common man so the idea of
widened deep models is you have this
thing called deep learning which works
very well for dense datasets and you
have linear models which work very well
for sparse data sets and the idea is
quite simple which is what if we combine
the best of both worlds and have the the
deep model work on the dense features
and the wide model work on the sparse
features and then combine the results at
the end that’s called widened deep
learning and that was another example of
an algorithm that tensorflow really
brought to the general public
speed so of course when you’re thinking
about deploying something for production
especially if it’s going to power a
website or you you have some real time
need for it then performance becomes
very important so tints flow is very
transparent about its benchmarks on
different problems how it scales
horizontally and you can go to
tensorflow dot org slash performance
benchmarks and just see the tests we’ve
run and see exactly what our
performances and how it compares and
then the last thing I’ll say about
tension flow is the reason why it can
you have this trade-off of it both being
developer friendly written in a language
like Python which is considered
developer friendly but traditionally
considered slow but also very performant
is because the output of your tensorflow
Python code is a what we call a data
flow graph which is looks like this and
this is actually visualized using a tool
called tension board which I’ll also
demo and that graph is just
series of instructions that is then
interpreted by a C++ back end and the
C++ back and takes care of how do I
compile this code in a very fast way and
specific to the type of hardware that
I’m instructed to run on whether that’s
a CPU or GPU for example
so that’s tensorflow is a framework why
you would want to use that for your
custom machine learning modeling and
then why Google Cloud machine learning
engine so basically why would you do
this in the cloud so the first thing to
state here is a lot of times you don’t
need to do it in the cloud as a rule of
thumb if you can train your model in
under an hour on your local laptop just
stick on your local laptop but at some
point if you’re using large enough data
sets or you’re using complex enough
algorithms you’re going to hit that
bottleneck where the training time on
your computer your computer hour
essentially becomes your bottleneck to
creativity and development and it’ll
stop you from trying different ideas or
it’ll stop a problem from being
tractable altogether and that’s when you
want to make the jump to cloud computing
because essentially then you have access
to a supercomputer over over an internet
connection and the tool I’ll actually be
using today is called Google Cloud
machine learning engine which is a
managed tensorflow service and it gives
you some nice things for example it
makes horizontal scaling very easy you
just specify I want to run this across
ten GPUs and Google will take care of
spinning up the infrastructure for you
provisioning the virtual machines and
then spinning them down when your job is
done it has an automatic hyper parameter
tuning function automatic monitoring
logging versioning built-in auto scaling
prediction service and coming soon
you’ll have access to tensor processing
units which I’ll talk about on the next
slide and there’s no lock-in
so because tensorflow is an open source
framework once you train your model even
if you use our product as your
supercomputer to give you the horsepower
to complete that training in a
reasonable amount of time
the result of that is an open source
so you can download that model you can
deploy it on a phone you can deploy it
on premise you can deploy it anywhere
you want to and the reverse is true as
well you can train your tension flow
model on your local laptop but you can
deploy it to our cloud just for the
prediction service because maybe you’re
expecting hundreds of thousands
prediction predictions per second and
you don’t want to build it up your own
infrastructure to handle that so you can
train on premise and deploy for
prediction in the cloud as well
tensor processing unit for those of you
who haven’t heard of this this was the
hardware that powered the the artificial
intelligence that defeated the go
champion if you heard of that it also
powers Google Translate and a bunch of
other internal applications and we’ve
now made this available to the to the
public in alpha at this point and just a
hardware comparison to give you kind of
a scale of how powerful these things are
a tensor processing unit can process 180
trillion floating-point operations per
second an Nvidia K 80 Tesla kad GPU
which is kind of the de facto cloud GPU
at this point is nine teraflops so this
becomes actually more important for
inference than training because in
training can always just scale
horizontally so you could use 20 Nvidia
K ATS and you’d get a similar speed to
using a single TP you you just have to
do the price performance calculation but
when it comes to inference also in those
prediction time and you need to get that
result back within milliseconds the
overhead of horizontal computing
actually it becomes more will take you
out of your real time requirement so at
that point vertical scaling on a single
chip becomes important and that’s when
these advanced chips like TPS becomes
more important okay so on to the actual
problem that we’re gonna build an
antenna solution for so what we’re gonna
do is predict housing sale prices and so
I’m gonna start by
asking you at the audience to make a
prediction on a simple data set so my
dataset is a 1,000 square foot house
sells for $100,000 a 3,000 square-foot
house sells for $300,000 and my question
for you and you can just yell it out is
how much would a 2000 square-foot house
all for two hundred thousand but it’s
pretty easy right so let’s make it a
little bit more complicated let’s add
two more features to my data set so in
addition to just square footage of a
house I’m gonna predict the housing
price based off of what is the crime
rate in the area and also how good are
the schools in the area so now I’m just
gonna let you read this slide and the
question is how much would a three
thousand square foot house with a crime
rate of 30 and a school rating of two
cost any brave souls won’t answer yeah
it’s not so easy now and that’s just
with three features so imagine if you
had a hundred features and imagine if
you had millions of rows this is where
you now need to use a machine learning
to make this type of prediction so this
is the problem that we’re going to be
working on so in order to solve this and
actually by the way if you want to
recreate this lab after the after this
talk this is the short link to some
documentation on how to do that so you
can snap a picture of this also you’ll
have access to the slides
after the through the conference website
okay so to do to actually do our
development we’re going to use a tool
called data lab so data lab is basically
a white labeled version of Jupiter or
ipython notebooks how many people have
used Jupiter or ipython notebooks okay
so – it’s a good way to combine your
code with documentation in a way that’s
kind of digestible and easy to share and
collaborate so within this notebook I’m
starting by just importing the
frameworks that I need to use so in this
case I’m using pandas which is a
framework that enables me to deal with
the CSV data that I’m gonna use easily
and parse it and then of course
tensorflow so I’ll just execute those
two cells and it’s always a good idea to
know what version of tension flow that
you’re dealing with so you can debug any
version conflict issues so I’m using
test foal 1.2 and we talked a little bit
about this idea of tons of flow api’s
and the hierarchy so in this case I’m
gonna be programming at these top two
levels the experiment and estimators API
and the main reason I’m gonna do this is
one it makes for less lines of code but
the other reason is it gives me
distributed computing for free so as
long as I stick to these higher-level
api’s then the then it doesn’t tend to
float just understands how to run on ten
GPUs versus one GPU and I don’t have to
make any code changes which if you’ve
tried to do distributed computing across
multiple CPUs or GPUs manually before
you’ll know this is a big time saver it
saves you a lot of headache and so the
actual steps that we’re going to go
through in this notebook are one we’re
going to load in our raw data we’re
gonna write our tensorflow code and
that’s gonna have some steps broken down
which I’ll explain then we’re going to
package that code up so that we can
train it in the cloud we’re going to
inspect the results of that training
using tensorflow to make sure that we’re
happy with our
with our accuracy once we’re happy with
their accuracy we’re going to deploy
that model to the cloud so that we can
have a third party make predictions
against that model over the internet so
we’ll start here by loading our raw data
and I’m using just a publicly available
data set about Boston area suburbs circa
1978 and it’s hosted in a Google Cloud
Storage bucket so this code here is just
downloading that CSV file from the
Google Cloud storage into a couple of
Python variables and memory and here I’m
just going to print out the first few
rows of that data so you can see what it
what it actually looks like and this may
be the first live coding stall here so
bear with me
just going to refresh this here
so this is running off of a virtual
machine in the cloud and that virtual
machine lost lost his connection
probably because I was doing the slides
for too long without refreshing this so
I’m just reconnecting here
and while this is coming up what I can
do is skip to the the second step here
is once you have your raw data loaded so
in this case a CSV file the next step
that you want to do is a data scientist
just explore that data understand first
of all is my data clean
do I have outliers just do a sanity
check that the data makes sense and then
second of all do any transformations
that I need to do so so this is a tool
called Google Cloud Data prep and it’s
nice because right off the bat when you
upload a CSV file it already builds
these histograms of all your fields for
you so for example I can see that the
field here the age of the house the
range is between 3 and 100 years old
that’s how old these houses are and it’s
more concentrated towards the upper end
so I have more houses that are old
closer to the hundred years old then
that are brand new houses and then if I
click Edit and then column details I can
get my basic statistics about that
particular column my main median
standard deviation and my quartiles and
things like that and let’s say that I
want to do some manipulation on this
data then I can just choose a
transformation here and there’s
different things I can do I can do
aggregation
I can flatten I can join I can pivot I
can do windowing so a lot of your basic
data manipulation here you can do
through this graphical tool and the nice
thing about this is the output of this
tool is a data flow pipeline and data
flow is an implementation of Apache beam
which is designed to work on large out
of memory data sets
so let’s say that you know I’ve looked
at my data I’m happy with the
transformations then at this point what
I would do is I would just export this
data to a data flow job that would run
the transformations and then I could
then pick back up with that data back
get back in tensor flow and of course
this had to happen as I was doing the
live demo
so it’s provisioning the Google cloud
shell machine which is how I connect to
the data lab instance Google cloud shell
is actually a free service so sometimes
if there’s a lot of demand in a
particular region you have to wait a
little bit to get a machine
so what I can do is hopefully that will
come up or actually it may just be that
my internet connection in general is not
working right now yeah I think that’s
what it is
okay there we go so one of the things
about working in the cloud is you
actually need an internet connection for
it to work okay so let me spin this back
up now there we go
and I’m going to connect back into that
instance okay we’ll give that a second
and while that’s loading up I’m going to
continue on in the github repository
here so now that I’ve done I’ve loaded
my raw data I’ve explored it I’m happy
with what it looks like now I can
actually start writing my tensorflow
code so to write my attention flow code
there’s about four or five steps I need
to do so the first thing I need to do is
define my data interface I need to tell
tensorflow how to interpret the
different columns within my data set so
there’s certain there’s different column
types you have continuous data you have
categorical data so price for example is
an example of a continuous variable it
can take any real value department in a
university for example would be an
example of a categorical value so it
could be one of a discrete set of things
math or English or psychology etc and
each of these features are optimized in
a different way by machine learning
algorithms so you have to inform the
algorithm what type of what type of data
this is and in my particular data set
it’s pretty easy because all of my all
of my code is all my features are
numerical real value numbers so they’re
all going to be categorical features
so here basically I just have a for loop
going through the features that I’m
going to use for this and telling you to
treat them all as real valued columns so
that’s my data interface now the next
thing that I need to do is pick out my
actual estimator so my estimator is what
you could think of is your canned model
and in this case I’m going to use a
neural network deep neural network and
the reason that I’m using that is
because the relationship between my
input features and the actual selling
price of a house is nonlinear so I’m not
going to have a lot of success if I use
something like logistic regression where
as deep learning and deep learning the
most common definition of deep learning
is just a neural network with two or
more hidden layers it’s good for
modeling that type of relationship so my
code for for defining a neural network
is just this cell here and I’m
specifying that I want two hidden layers
each with 210 neurons in each layer so
10 neurons in the first layer and 10
euros in the second layer and that’s all
I have to do to define that and I’m
passing in my feature definition from
the previous cell so it knows how to
interpret that data so just take a
second to appreciate that that I could
do that with just a few lines of code
defining a neural network is typically
something that’s pretty involved to do
so that’s the advantage of dealing at
this higher level of abstraction so then
I’m going to define my input function so
at this point I’ve just specified a data
API if you will but I haven’t actually
passed the data in so this data is
actually reading from that that pandas
variable that I created earlier which if
you remember just pull that CSV down
from Google Cloud Storage and stored
in memory and I’m saying that that’s
where my features exist and my label
which is what I’m trying to predict is
going to be the the feature that I’ve
identified as label which I did earlier
on here which is median value which is
actually a sales price so I’ve got my
features and my labels and I’m passing
those is my input function and then you
have a separate input function for
serving so serving is just a synonym for
prediction or for inference so at the
time when I have a user a third party
user using my prediction service they
may pass in data in a slightly different
way than how I trained on so that’s why
I’m allowed to define a separate serving
input function which I do here and I
actually am NOT going to pass it in any
differently so the first part of the
code is is the same and then there’s
just some boilerplate code here that you
don’t modify and the last step is I’m
gonna package all of this into what’s
called an experiment and an experiment
is what allows me to actually do this
distributed training without having to
write any additional code and it’s just
a wrapper so I’m just passing in
pointers to all of these previous cells
that I’ve defined it so my input
function my estimator and then I am
specifying that I want to train this for
3,000 steps and then at the end I want
to do an evaluation step to see how how
good this model is performing against my
evaluation data so that’s all the
tension flow code and so my code is now
written and in order to train this on
the cloud I need to package this as a
typical Python file so here I’m just
taking all these individual cells and
pasting them into a single cell the code
hasn’t changed and then at the end here
there’s some boilerplate code that you
would just copy and paste that
tells up tells this program how to parse
certain command-line arguments that the
Google Cloud ml engine xmx so at this
point I can actually train and I’m gonna
train both locally and in the cloud so
you can see how both of those work
there’s a couple cloud variables that
I’m defining here so this is the address
of my Google Cloud Storage bucket that I
want my model to be written to this is
the name of my Google Cloud a project
and this is the geographical region that
I want my virtual machines to be spun
upon and I’m just setting those as
environment variables so I can reference
them in the remaining cells and so first
I’m gonna run this locally and in this
case locally is actually still on the
cloud because data lab is a cloud
virtual machine and you’ll see that that
that finished pretty quickly I have some
more compilation warnings here which you
can ignore this is just because I didn’t
compile center flow from scratch on this
machine so there’s certain optimizations
it’s letting me know that I can take
advantage of if I want to and if I want
to run this on the cloud then I use a
command line tool called G cloud and I
just specify the the the path of my
Python file that I had just written here
to disk so that’s all my code cells just
copied pasted it together and written to
disk I’m writing a pointer to that so it
knows where the code lives and a I’m
referencing again my Google Cloud
Storage bucket so it knows where to
output the results to so I’m gonna run
that and so so this is queued
successfully and this is just running on
one cloud cloud machine learning unit
which is you can think of this is the
equivalent of a local laptop now here’s
the real power of me writing this using
the estimators API is I can just add a
argument here that says – – the scalar
equals standard one
and now that’s gonna run instead of on
one CPU it’s going to run across ten
CPUs and similarly if I want to run on a
GPU instead of on a CPU I’ll change this
skelter to basic GPU and in terms of
pricing the GPU costs three times as
much as a CPU to run on so as a user
your price performance would be you just
want to make sure the GPU speeds up your
training at least 3x that’s your
break-even point and so there’s certain
algorithms where GPU can actually speed
up your training 4050 X and there’s
other types of applications where GPU
isn’t that much faster than the CPU so
just be aware of that when you’re
deciding whether trip train on the CPU
or a GPU and then lastly I am going to
run across multiple GPUs so the way that
you define this is a little bit
different I create a yamo file where I
where I specify my machine type and then
I referenced that Yambol file as a
command-line argument so it’s still not
that complicated but it’s just a little
bit different and so now I’m running
across eight cloud GPUs and actually in
this case those are nvidia tesla k 80s
so i’m running across eight of those in
parallel so if i go back to the cloud
here and click on ml engine jobs you’ll
see all those four jobs that i just
kicked off then they’re spinning and you
might be asking why this is still
running if the one I ran locally
finished pretty much instantly and the
reason for that is when you’re running
in the cloud it’s starting up the
virtual machine from scratch so the
startup time for that is about five
minutes and then it runs the job so you
wouldn’t actually want to run this
particular job in the cloud because it’s
such a small data set but when you’re
thinking of larger data sets where it’s
going to take hours or even days to run
then that five minutes becomes pretty
insignificant in the context of the
horizontal scaling that you go
and in terms of the the logging for any
job as it’s running you can just click
view logs here and it’ll give you a
running output of of the logs so here it
says waiting for job to be provisioned
it’s still spinning up the VM and then
when the job is done we can deploy that
well we can inspect our results so I’m
gonna go back into my data lab notebook
here and I’m gonna inspect the results
of the the job that I ran locally since
that one’s already done and we’re going
to inspect the results using a tool
called tensor board
and so remember this review at this
point we’ve started with the raw data
set we’ve done some exploration of that
data we then wrote our tensorflow code
to define how to interpret the specific
columns of that data and also what type
of algorithm that we wanted to use for
training so in this case we used a two
layer neural network and now we want to
investigate the results of that training
did it actually learn anything is it
doing anything useful so here I’m gonna
click on my loss diagram here and I’m
gonna expand this and basically what
you’re looking for here is that the this
line goes down and to the right that
means it’s learning something over time
and you can see here’s my training step
here so I have my 3,000 training steps
and then I’m running one evaluation step
at the end of it
so just to do a sanity check here my
evaluation step at the end is saying a
loss of six point four times ten raise
to seven which might seem like a lot but
this is actually squared error that’s
how these that’s how this is measured so
if I just do some quick math here so I’m
gonna do the square root of six point
four six followed by or let’s just do
six for six and then one two three four
five zeroes so that’s that’s that same
number and so that comes to about 8,000
so if I put this back in the in my
original problem that means it’s
predicting the selling price of a house
with an average error of about eight
thousand dollars which which is pretty
good maybe it could be better but for
any of you who ever sold a house or
bought a house if you could know ahead
of time within a thousand dollars what
it was going to sell for you’d be pretty
happy with that
so I’ve inspected my results I am now
happy with it and
I’m going to now deploy my model for
actually for for prediction so to do
that there’s another g-cloud command
here and I’m just again pointing to the
to the model file that was created and
what this is doing now is it’s pushing
my model central model file to the cloud
and it’s spinning up some infrastructure
to actually back that and this is all
just killing infrastructure so let’s say
today that I only get you know let’s say
I’ve just launched this API and I don’t
have any users yet so I’m only getting
maybe ten requests per day then it’s
gonna have the minimal VM to back that
but all of a sudden if that API goes
viral and I’m getting thousands of
requests per second then this service
will just spin up more VMs in the
background to handle those requests so
it’s it’s a cloud native fully auto
scaling service in that sense and I can
see here if I go to my models section
and then click on housing prices that is
creating that model and this will take
just about probably 15 more seconds
because they’re just spinning up that VM
to back it and then once that’s done I’m
gonna actually test it out so I’m gonna
write out a a couple of new records here
for a prediction so I have all my
predictive features here the crime rate
the age of the house the
student-to-teacher ratio in the in that
area and I’m gonna write those out to a
file and then I’m just gonna pass that
file to the service and then here I get
my actual predictions back so 24
thousand dollars and seventy thousand
dollars respectively and keep in mind
this data set is from 1978 so those
those housing prices are actually in US
dollars are passed the sanity check
so just to review what we covered here
is how to use tensor flows high-level
estimator a
p.i how to deploy it for distributed
training on the cloud how to evaluate
those results using the tool called
tensor board and then how to deploy that
resulting model to the cloud for online
prediction and I also want to point out
some important things that we didn’t
cover so in this case we were using a
data set that fit in memory and if you
wanted to use a data set that was larger
than memory or a quote-unquote big data
set there’s some other things you would
have to take into account we didn’t do
any true feature engineering we just
passed our CSV features as in without
doing transformations because in this
case we didn’t need to but in most cases
you will need to and then lastly we
didn’t do any automatic hyper parameter
tuning which in a real data sense data
science environment you would normally
do that if you would like to learn how
to do those things the last thing that I
will plug here is there is a great
course on Coursera that we release
recently that covers how to do all those
things
it’s a one-week course it goes through a
case study of predicting taxi cab prices
in New York City and it’ll do kind of
even a level deeper than we did today so
so yeah apologies for losing the
internet connection and not realizing I
lost it for a couple minutes there but
thank you for bearing with me and I have
technically 50 seconds for questions but
it you know if we have any I’m happy to
take them and there’s one here
when collecting more data continuously
how do you use this data to
automatically retrain your model that’s
the question yeah so that’s a good
question and that’s a question that’s
very relevant for certain types of
problems like recommendation engines
where you are getting new information
about your environment and you want to
retrain on a daily basis maybe even an
hourly basis there’s other types of
problems where what you know about the
world doesn’t change for maybe six
months or years at a time so you don’t
have to worry about that the answer is
you just have to have a orchestration
plan in place so you need to have some
type of messaging system that alerts you
when there’s new data to return to train
on and then that kicks off a new
training job and when that you training
job is kicked off your second decision
point is how do I combine the results of
that new training job with my old
training job so I might just replacing
it the old model completely with my new
training data or maybe I want to do a
sliding window where let’s take the
example where I’m training on a daily
basis I want to take today’s data and
combine it with a sliding of the last
seven days data that I trained on or
maybe what I want to do is just an
incumbent versus Challenger scenario
where I take today’s data that I trained
on evaluate the model against my success
metric and see if it does better or
worse than my incumbent model and if it
does better I promote that to production
or if it doesn’t do better I stay with
the incumbent model so there’s a couple
different ways that you can do that
we’re working on some managed services
coming out to make that orchestration
piece even easier right now you have to
do it a bit more piecemeal but it
basically it’s a very similar to problem
to the kind of continuous integration
continual continuous deployment tools in
the non machine learning sector ok we’re
out of time and thank you very much
speaking thank you
you

Please follow and like us:

GOTO 2017 • Machine Learning with TensorFlow and Google Cloud • Vijay Reddy

Be First to Comment

Leave a Reply Cancel reply