GOTO 2016 • Fixing the Image Problems of the Web using Machine Learning • Chris Heilmann

hello so as said if you locked if you
looked for the other talk bad luck it’s
me I’m also gonna be here later on
talking about progressive Web Apps but
I’d filled in and the organizers were
nice enough to give me alcohol and use
words and things and it was very
effective so I started saying okay I can
give another talk I’m actually going to
a pixel camp after that in BA in
Portugal and give something similar
there so it’s good thing to try it out
on a real audience so I want to talk
today about the image problem of the web
and not the image of the web but images
on the web and how we can solve these
issues with machine learning and also
with just tooling that we think about
coast as far too many things on the web
that we’re not optimizing right now and
the problem is that we’re sitting on
these wonderful machines with fat
connections and we put things on the web
and we think it’s beautiful and our end
users basically stand there on a mobile
phone with a spinning thing and see
their their data data traffic in the
background going there and a bank
account going down and they wonder
what’s going on so I’m Chris Hillman a
code poet on Twitter in case you need
more pictures of kittens and hedgehogs
and some JavaScript stuff as well I’m
they’re pretty active there this is me
looking at mobile phones and being
annoyed about the state of the web on
mobile phones a few years ago and
luckily enough to got a much better I
work for Microsoft on open source things
like the JavaScript engine Visual Studio
code and also a lipid in a machine
learning but I’m more of a fanboy of the
machine learning team and help them to
talk to humans and humans to machine
learning people it’s a bit like
translating for programmers to other
people which is another skill that you
should have let’s go back in time a bit
we all know this character right if you
know I’m in green then you probably have
an older sibling because you were never
allowed to play Mario you always have to
play Luigi and what is funny about this
character and I used to do games on
Commodore 64 and gameboy color and an
Amiga later on is we always think like
this this limitation is because this is
how it is but there is sense and sense
and meaning in all of the things that is
about Mario first of all the red and
blue offered the best contrast to the
skin
in the game background so it’s that they
didn’t put them together because they
were the colors that were left over they
put them in there so you can see the
character all the time nowadays when
people go for nostalgia of pixels they
don’t understand that back then you
never saw a pixel we jumped over
backwards to make sure you’re doing see
pixels because it was bad TV sets they
smooshed everything together so you got
to make sure that you have contrast
between the different colors to make
sure that your game character is visible
the cap meant there was no need to worry
about hair style eyebrows or a forehead
actually originally Mario was not
supposed to be a block a plumber they
just needed to make him a plumber
because they didn’t have enough pixels
for the hair when he fell down a hole so
they just gave him a cap and then like
okay who has caps plumbers okay cool we
got these pipe sprites as well as we
might as well use those and the large
nose and the mustache made it possible
to avoid a mouth and facial expressions
because you just didn’t have enough
pixels for different facial expressions
and this is what the mario was built for
him what it was designed for and it was
great because it was designed by
limitations I always loved being
creative in environments that are
limited and this is over and back then
we fought for every pixel fought for
every piece of information for every
byte but this is over because nowadays
we got these massive machines fast
connections we’ve got quad-core
computers in our in our pockets and we
just don’t understand that people on the
other side of the planet might not be
able to see that so everything has
reasons and meaning in that design that
we did back then whereas nowadays
evolution is happening around us we’re
moving away from desktop machines to
laptops and now actually to mobile
phones the next million users of the web
will not be on any desktop or laptop day
will be on mobile devices and the reason
is infrastructure in countries where
there is growth on the internet like
Africa Indonesia Bangladesh India people
don’t have any flats where a computer
could be set up people can’t afford a
MacBook Pro people can afford a mobile
phone and a data connection there’s not
even cables in the ground to get
connectivity but there is mobile masks
everywhere so everybody will be on
mobile devices so that’s what we have to
think about for the near future or
actually right now this is where the
next growth will be and the next growth
after that
not even have a UX anymore which will
just be chatbots and systems that people
can talk to so technology advanced and
pixels are a side product of our
interactions with the web
most people don’t draw things on the web
or make graphics or they just take
pictures and upload them and they don’t
even caption them they don’t even
explain what the what the picture is
they just let the pictures speak for
itself which of course is incredibly
depressing if you’re a blind person and
you get these pictures without any
alternative text you don’t know what’s
going on or you like an old person like
me and you try to understand what
snapchat might be about you just like I
have no plan anymore what’s going on
here these are people sending selfies to
each other for the last two hours what’s
the meaning of this but okay I’m old
it
the problem is that we take pictures and
we upload pictures and unoptimized
pictures to bigger the bigger the better
I mean some of the phones the phone that
I have is like a 20 megapixel camera in
it this is like an 8 make photo that I
just upload in the background cause my
data plan in England it’s good enough to
do it I don’t care about it and if you
look at the average website and that is
actually rather old this is probably
bigger right now we can take a look at
that later if you want to its 2.2 Meg
for a website this is not a web or web
page this is not the site that’s not the
whole site is the first loading thing
that people see on the screen and it’s
2.2 megabyte to say like hello and
welcome to our website
and this is the state that we’re in
right now because we kept pushing things
on the web you need these 12 libraries
you need these 15 JavaScript frameworks
and you upload images because they’re
pretty and on iOS iPads in retina they
need a needle II look great and
everybody else should get the same
picture so it’s 1.4 Meg of images in
that on the average on the average web
page out there I call this inspirational
obesity we just put things in there
because we see them pretty on our
high-end devices but our end users don’t
necessarily get them they’re just
standing there and getting the loading
spinner for five minutes which is not a
good experience 1.4 megabyte of images
mostly because of wrong file formats
people save images as PNG s with alpha
channel that don’t need any alpha
channel and would be happily a jpeg or
where P if you have a browser that
supports web P or you have like text
saved as JPEGs and it’s unreadable you
know
without all the artifacts on it it just
drives me crazy that we just don’t
understand which format to use for which
image because most of time people who
upload images are maintenance coders
they’re not developers they’re not
designers they’re people that just use a
content management system drag and drop
it in you know when you’ve been your
freelancer and you ask a client for logo
and you get a word document or an embed
it for logo in it and you just want to
go back to goat farming and just I don’t
want to live anymore
we’re delivering high scaled high-res
images to everybody we take a six
thousand pixel image and make it torn
and pixel wide I’ve seen that so many
times it’s quite fun as well and people
wonder when browsers are slow no
automatic conversion optimization steps
we have all that technology we just
don’t use it we just have an upload
facility and even in WordPress tells you
can only upload two megabytes so what
does the good WordPress admin do turn
that off and say like you can upload
whatever you want and then people have
20 megabyte images in their hero image
instead of text content that’s very
important web design lately it’s just
this massive thing and I blame medium
you know like instead of riding along
article you got this massive image
before you actually scroll you like what
did you want to tell me this image is
not you is it we need to change that to
make the web fast again because a
connectivity is our biggest new hurdle
it’s like for us here on Wireless this
section amazingly good for wireless for
a conference but most of the time you’re
in you’re under somewhere and your
connectivity might be good but the next
second it might be gone it might be a
connectivity to a Wi-Fi connection that
shows you oh I’m Wi-Fi and then you
can’t connect it unless you give it your
credit card details you firstborn some
blood and like your your home address or
something like that or sometimes you
trust it can’t even connected although
it gives you the full bars it’s called
life I the web is much bigger than our
little developer world and growth
happens outside of it if you want to
think about the next few years of the
web and you want to keep your job
think about those markets that you don’t
think about right now because this is
where growth happens everywhere else is
on the decline people don’t actually
download new apps people don’t use the
web as much as they used to do the big
winners are people that stay inside
Google services inside Facebook services
inside being no and inside Facebook
services Google services and of course
in in chat systems like what
one of these solutions that Google and
opera for example are really good at are
cloud services and proxy browsers so
those actually used for example a lot in
Africa and India automatically strip
down your images and automatically
convert them to something really pixely
and ugly because it’s better if somebody
gets an ugly picture then no picture at
all if you relied on a picture too for
your content and they also stripped down
your CSS and your JavaScript if your
javascript takes longer than 1.2 seconds
to run on an old android device then it
actually takes out your javascript so if
you relied on your javascript for your
page to load you’re not lucky there
people will not see anything but a few
things we can do there’s a few things we
can do instead of relying on these proxy
browsers and hoping that Google fixes
everything for us so the problems with
images are huge images for everybody and
optimized images no alternative content
no training or incentive to add content
in content management systems and here’s
our Arsenal to fix that I’m doing a bit
faster because he said I have to do Q&A
and stuff so but you’re clever so it’s
all good better browsers with responsive
image support are here right now and we
don’t have to worry about these older
browsers anymore automated lossless
image optimization tools file level
access to images to extract metadata
scripting solutions to offer alternative
content and cloud services machine
learning API for intelligent resizing
and I’m going to go through them bit by
bit machine learning for tagging as well
so browsers with responsive image
support responsive design it should not
be an unknown to people anymore it’s
just a sensible thing to do because I
have this and I look at it like that I
turn it like that then I switch to this
one and I switch to my Xbox to my fridge
to my dog to my cat wherever the
internet runs on nowadays there is no
screen size any longer there’s no thing
oh we need thousand 24 pixels it’s like
water you put it anywhere in it will
fill as much as it can and it’s fine so
media queries we’re the first idea that
we had with that in CSS and also in
JavaScript with match media
the problem is with media queries
degrees so if you actually have a CSS
file with your large images in it your
mid images and your small images the
browser loads all of them and only shows
the one that is appropriate but
data in the background still gets used
and if you’re on a metered data plan
that’s a bad idea
that’s why we invented the picture
element and source set sauce that came
from Apple picture element came from
people who just looked at the video
element and said like why don’t we have
a picture element because in this one
you define the image in several several
formats and several sizes and the
browser only loads the one that actually
applies and doesn’t touch the other ones
so that way you have no problem of all
the images being loaded the support is
great this is again outdated is fit
updated now I think Safari is is now in
the newest version can I use dot-com is
always your friend if you want to try
something new out type it in there be
happy and start using it don’t don’t
complain that all browsers might die
because they have to die great
information on Jake Archibald block the
anatomy of responsive images where he
explains what all these source said
shortcuts mean and what all the
information is about but in essence most
of the systems out there already used at
WordPress now uses it out of the box if
you just put an image in there it does
the picture element for you there’s also
a great live demo on didi on our windows
developer site and that one shows you in
a real world scenario what that looks
like so that painting has been painted
by one of our one of our colleagues by
his wife and shows you like then it only
loads the image that is necessary for
that size in the right format instead of
downloading lots and lots of data in the
background and then resizing it
accordingly and you can you can play
with that quite nicely to do or have a
proper text to image ratio as well now
you have automated tools for lossless
image optimization that’s very important
lost the image optimization you make
your designers really unhappy because
you put like artifacts in there or get
it down to like 12 colors instead of
like 256 it’s not fun to do that
lossless optimization a lot of times
it’s a packing algorithm that doesn’t
change to look and feel but just goes on
the byte level of the image and strips
out the bytes they’re not necessary and
not needed cause content Photoshop and
other image editors put a lot of data
into the file itself that you don’t need
imageoptim is the big one there if you
don’t use that yet please use it it’s
also available as an NPM module you can
also put it in your node solutions and
that one allows you just to drag images
into it and it automatically Optima
the images according to what it is so a
gif gets optimized with one optimizer
JPEG with another a BMP if you use it on
the web I come and hurt you and when
other images tip for example get also
downsampled to a format that makes much
more sense and it’s as easy as that just
drag it in there and it’s it get it’s
replacing the original image it’s not
making a new image that you don’t have
to copy over or something it just
changes all the things in there that are
not needed and in this case for example
we got 44 44% win on a JPEG and this is
as simple as it is before you put your
images on the web run it through a
system like that and everybody wins
now we have file level access to
information and images we always had
that in things like image magic or GD
library in PHP but now we have it in
JavaScript as well we can use the exif
data in the image itself when you
right-click something in Windows and
shows you the exif data where it’s done
you can access that in JavaScript and do
cool stuff with that as well for example
instead of rotating a JPEG in the
browser you can actually read the header
and then it tells you what the rotation
of the image is so the so you already
know that it’s gonna be displayed the
right way before you turn it out and
2012 when I used to work for Yahoo we
already played with that we already work
with that in Flickr and it was just
amazing that we haven’t thought of this
in between because what Flickr for
example did when I uploaded images it’s
a pretty cool thing you dragged it up
there and you show it in a browser and
the photos immediately show up these are
like all 8 megabyte pictures and it was
not a fast connection so the photos show
up quickly and then they stop start
uploading in the background so if we
take a look at this zoomed in you can
see that the image shows and then you
got the little circle thing there
uploading the image in the background
and this is using the exif data in the
jpg itself every every jpg has a little
thumbnail in its in its in its file so
you can read the first 50 bytes and then
display that as a thumbnail and then
load the rest of it so instead of
loading it and then have an unload
handler you load it as a file reader as
a stream and display it while you’re
actually showing it and that is a great
way to give an interface to your end
user
that looks much more interactive than
just please wait
of course there’s exif data in your
pages as well if you don’t want to give
out I created at remove photo datacom
which works on a mobile phone works
offline doesn’t have any server at all
just works in JavaScript in your browser
where you can drag an image in and gets
all it gets rid of all the exif data and
gives you the image for downloading so
in case you don’t want for example your
geolocation in your image or you don’t
want to people to know which camera it
was taken with it’s probably a good idea
to do these kind of things
the geolocation is also visible in most
of the JPEGs that you do nowadays with
cameras and I can tell you where the
picture has been taken and that has been
the downfall for a few people that want
to harass other people with with
pictures of parts of their anatomy and
then they actually found them because
they realized where they lift which is
good but you should actually make sure
that if you don’t want to give that data
out there be sure that it’s actually not
in your JPEG file anymore so a good
thing about an interface with images is
to provide fallback content so instead
of just waiting for the image to load
you could for example give a colored
background that is part of the image and
then gets replaced when the image has
been loaded a lot of a lot of systems
use that nowadays already the blur up
technique is a big one
you can see this one for example here
let me start that again where you you
see the image being blurry and then
becoming becoming sorted this is on
medium medium uses that for example and
on medium this is the code for it and
this is pretty much nuts you know it’s
like a figure with tan those dips in
there and a JavaScript and an image
progressive media bla whatever I don’t
know what’s going on there and which is
it looks good but this is I don’t know
why they do it that way because there is
a CSS technique to do the same thing so
what you do is you take this much
smaller image of that one like the
thumbnail that is embedded in the JPEG
and you scale it up in CSS with a
hundred percent of the width the auto
width of the container and set of CSS
blur filter on it or an SVG blur filter
over it and then when the full image has
been loaded you just turn off the filter
and you get rid of the small thumbnail
image and that way you get the same
effect without having to jump through
groups of
10,000 lines of JavaScript but it looks
good it gives it gives the impression
that something is happening and you
cannot do you cannot do nothing worse
than making people just wait
people don’t like waiting and especially
not on a mobile device so this is a
great way of making that work you can
also count pixels in in canvas I have
full access to every image in the
browser nowadays I can’t have access to
an image on another domain because
there’s a security problem in there but
if I drag and drop an image for example
into the browser or I have the image
already on the same domain
I get level I could pixel level access
to the to the image itself so if I do if
I put it in the canvas and read out the
canvas State or the canvas data is a an
object with the width and the height and
then it’s an array of four elements
which is like the the RGB and the a
value of each of the pixels so for
example in this case here I have this
little c64 text thing and I just count
the pixels and tell it as ten thousand
four hundred seventy two black ones so
that’s probably the main at the main
color that I want to use here and you
can use that too to determine which are
the colors that are there but it does
better ways of doing it but this is a
nice way or simple way of doing it and
this is the code so just note that down
quickly now the slides are available
later on as well there’s lots of tools
that use these kind of functionality as
well there’s color if ideas which users
be uses the gradients as a background
and find out the right color for you and
it has a lazy reveal as well so you can
load them and fade it in from the image
to from the color to the image and so on
and so forth and you got color thief
which is really really cute as that one
allows you to like for example click
this is a demo here so it clicks on the
image it finds the dominant color and it
finds the palette of it as well so this
is cool to basically have an image and
show CSS stuff around it that it’s the
right palette and the right kind of
color according to that image and that’s
again a JavaScript library you can use
for that now let’s go to the
nitty-gritty of like what we can do with
computers nowadays about images and this
is where I’m getting very excited what
we can do which is for example
intelligent image resizing so to have a
thumbnail of that image would normally
be like let’s take
that massive image and make it 150
pixels wide and you have like a few
pixels on the left that might be a woman
and lots of blue pixels on the right
that we don’t need so instead what we do
is we detect okay where this would be
the normal way to cut out 150 150 in
there it’s nice but it’s not good enough
this one is much better because what we
did is we detect at the face of the lady
and then we actually centered it on the
paint on the thing and cut the rest out
and this one is the best because we
detect it in the image the outline of
that person and then actually cut only
that one out so this is something that
you do by hand in Photoshop or something
but machines can do quite nicely
nowadays as well and it makes much more
sense to have something like that
displayed in your website then something
that that it’s just a blurry mess and
you don’t know what’s going on there and
you don’t want to click on every
thumbnail and please never ever resize
an image to become a thumbnail the idea
of a thumbnail is it’s a preview of the
image both in file size and in size not
only in size I see so many people
downloading 550 megabyte pictures and
show them as 100 per hundred and when
you click on them look it’s really fast
because it’s already loaded yeah 20
megabyte are downloading downloading are
only one of watch one of them there’s a
JavaScript libraries called smart crop
chairs that explains you how to do these
things it’s kind of heavy on the machine
so on a desktop machine fine on a mobile
phone please don’t run this kind of
stuff because it’s not meant for doing
that and you don’t want to fry eggs on
your back of your phone just to have a
few have a nice thumbnail so you see in
this case you do you see in in this case
it found the outlines of the man and
then the crop around it and that way it
found the right size so it it it
determines what the outlines are and
depending on how close they are to each
color and to each other it realized this
is the most important part of that image
there is a company called cloud in Airy
that are using our systems under the
hood and a few others as well in Israel
there they’re really really adamant
right now to tell you about their stuff
but they’re really lovely people I
wasn’t Israel a few days ago and I
talked to them and what they do is to
give you your a URL API like a REST API
so you can say rest are narrator calm
and then you have your image the
uploaded image and then you
say okay give it me give me a 16 by 9
ratio make it 640 pixels wide off the on
the phone JPEG so this one now realizes
okay it’s sixteen by nine it cropped it
to sixteen by nine and it made it 640
pixels wide of the image that you
uploaded this is kind of cryptic but
they’re actually making it much more
easier for you by having a proper SDK
and as you can see almost every language
out there Ruby PHP Python or J’s Java
whatever and that one allows for
intelligent resizing of images so when
you now resize the browser it gives you
the image that uses the best space and
if you’re on the right-hand side you can
see that the image images show more and
less but keep the people in the middle
of it because they Center on the face so
that way you can automatically art
direct your images without having to
crop them by hand because the machine
learning algorithm does that for you and
understands that for you image X is
another service that actually does that
and they are even getting better they’re
not only using facial detection which is
like eye nose mouth but they’re also
doing doing a high contrast version of
your image and that way find out the
most important parts so the same here
they’re doing the outlines and the high
image and then crop the rest that
doesn’t have enough contrast and that
also works that way so what about
information that isn’t in the image this
is this is basically what you can do
with the image itself but what if we
want to know that this is a coffee mark
and this is like or this was like the
current President of the United States
in their image and we don’t want to have
to know that machine learning and
artificial intelligence to the rescue
robots and computers are there to plow
through data and data and oodles and
oodles of data without getting bored and
this is the good thing about computers
this is what we should be using them for
so Facebook has for example and I
automated a charge of text this is a
photo a friend of mine uploaded and it
says image may contained so there there
you see that it’s actually automated
you’ve generated dog alt or an outdoor
nature and you can see it in the
alternative text on the image itself
here as well and if you develop any
developer tools here in this case
Firefox how do they know that do they
have like people in the basement chained
to a desk that actually have to type
things in maybe I don’t know but I think
most of the time they use computers for
that it’s not Mechanical Turk anymore
that used to be the thing in Amazon to
do these kind of things so there’s a
great blog post on the Facebook code
blog that explains how they’ve been
doing that for years and years like all
the images that are in Facebook have
been analyzed have been classified have
been detected and have been segregated
or segmented into different sections so
you’re say you say like okay I’ve got
sheep
I’ve got dog I’ve got man and then you
find all the Sheep the dog and a man and
you segment them out and that way you
have it in the database if something
looks a bit like that it’s probably a
sheep from behind and they do that with
all kind of data that they have on
Facebook images already and now finally
they gave us access to that one as well
in a programmatic level that we can use
that for our implementations as well so
it’s not that they just are evil and
find our data they’re giving it out as
well which is pretty good and Google
you’ve been doing that on google photos
for quite a while as well I showed that
the other day in Germany like my photos
I don’t type any German I don’t I only
type English but for example you can
click on selfies in google photos and
automatically finds the pictures that
are selfies without you ever having to
type in that this was a selfie so this
was a smashing come front another
conference and it’s basically me what I
had talking these kind of things and it
also finds locations for you so I say
for example Tel Aviv and it doesn’t even
it doesn’t only use the JPEG data of Tel
Aviv but this is for example Heathrow on
my way on the flight to Tel Aviv so I
don’t really know how they did that but
it is the right photo and the pictures
of these emojis and these kind of things
are all done in Tel Aviv as well I can
then say a hunt which is dog in German
I’ve never entered that ever but I have
uploaded pictures of dogs and cat for
qotsa for cat it detects my family’s dog
as a cat which is true because he
behaves like one but I don’t know as
some sometimes it’s not that’s good I
said things is but it’s pretty amazing
that you have all these cool information
in there without having to type it so
the data behind that is from databases
that have been used for years and years
to classify and tag images there’s image
net for example which is 14 million
images right now
and that one gives you a database to
compare your images against and to find
to find the right solutions to this is a
cat this is a dog and so on and so forth
good luck Google just last week released
the open images data set and that’s over
9 million URLs of images that are
attacked and classified for you so you
can use that CSV it’s on github and with
the metadata that you can download and
run it against your own learning
services to understand what your images
might have in them and this one has for
example in this picture the balcony
stairs facade iron and so on and it’s
not like just like that’s a spoon and
that’s and that’s a fork it has lots of
information in there and is highly
highly detailed that you could then for
example run through a translation
service to find like the Danish dog or
the Danish well Danish dog is like those
big ones but there’s different story
they also have the they run these tag
stand through a language compiler and we
do that as well with a few of our
services so image captioning it’s open
sourced intensive flow and you can use
that and what they use it for is mostly
for their up for the google photos but
also the upcoming allo a chat client
that they’re doing so in this case they
have like human captions from the
training set which is like a man riding
away from top of a surfboard and they
automatically captured one is finding
three different images from that so they
take whole sentences from that data set
rather than just having a tag saying
surf port man wave which is not human
readable and not beautiful and they also
then detect two syntax detection on
these things and find the nouns and find
the attributes and mix and match them to
make better captions for other things
afterwards and they also find then take
it together with the image data and for
example instead of saying like as a
train with a Union Jack on the side it
says like it’s a blue and yellow train
because it also detected again how many
pixels are blue and how many are yellow
and there’s two brown bears as well
instead of just two bears and what they
use it for is for a lower face if you
upload an image and you get these
automated tags that if you don’t want to
type something in you can just type that
on there which is pretty cool but I find
it really bizarre isn’t that doesn’t it
mean in the end that we as humans just
become a transportation
service from two bots to talk to each
other because I’d rather have you type
something in a mistyping and make it
human than just give me like Oh friend
robot answer kind of thing you know it’s
like it’s it’s really odd but people
seem to be too lazy to type it in so
they want that fine we have something
like that as well called captain bot
which is using three of our services so
all of these services are available
Google’s tensorflow Facebook’s whatever
it’s called Amazon has a few things in
there our lexer systems or pure Alexa
skills and we have the Microsoft
cognitive services that you can play
with where you get 5,000 hits a month
and then you can pay for more later on
so this one is an upload image you can
try it out at caption bot on AI analyzes
the image and says it’s I think it’s a
young man jumping in the air on a
skateboard and you see there we don’t
have man skateboard young we basically
have a whole sentence because we ran it
through a language analysis tool in
machine learning as well to give you a
sentence at the end now detecting humans
is a very important thing as well
one of our services does that for you so
it realizes this is a 28 year old men a
man in water swimming and it also tells
you if you scroll down and don’t have
that animated this time and I don’t have
a life here right now it also finds the
colors for you and it realizes if it’s a
racy photo or if it’s an adult content
photo so before you upload then you can
automatically do that the other service
that we have is automatically detecting
child pornography in case you have an
open system that you want you allow
anyone to upload anything you don’t want
that to be abused by the most horrible
people on the Internet you can do that
you can run it through that service
before and automatically flags and
deletes images that have been already
recognized as a totally illegal content
and that way we protect both the the the
people that had these pictures taken of
them and you from prosecution because we
actually find out who’s been uploading
them for example the lady down here in
the bikini will be flagged as racy but
not as adult and this one will find out
train and train station and all kind of
things it’s a city city line so the
images are there to find the information
in but mostly once we detect the face we
also care
also guess the age and we also give you
the gender once we have a face we also
give it an ID in your data set so you
can try that out for yourself for for
example verification or logging systems
or detecting if the same person is in
two different images and then
automatically clustering them into into
different folders we also have emotion
detection so we detect for example that
the man here is is kind of happy but
he’s also what else is there he’s a bit
of fear nor the kid the kid has a bit of
fear and it’s a bit of a bit of neutral
and a bit of surprise so his mouth being
open sadly enough I didn’t bring it with
me normally and our booth I have this
demo where you have to show all the
different emotions and then you can win
a prize which is pretty pointless to try
in Finland but from time to time you
it’s fun to see what computers think our
different emotions are in our different
states of emotions are so you can detect
the faces you get the JPEG you get the
chasing back and that is basically just
a REST API then you can throw an image
against and you find out the pupil left
the pupil right and the age and what the
pet pose is like which angle to put the
faces on so when you do for example
login you don’t just do it with one face
you have to ask the person to change it
so you can see it’s a 3d face and not
somebody holding up a picture of you to
log into your computer you can verify
the face once we know it is that the
same person no it’s obviously not the
same person you can cluster them into
different clusters automatically so
these are men these are women these are
in-betweens these are I don’t know today
kind of things and the great thing is
that putting these all together you can
really empower people and there’s where
I want to show you a quick video that a
colleague of mine has done and it’s it’s
pretty stunning so let’s just work that
quickly together
I’m Sachi shake I lost my sight when I
was seven and shortly after that I went
to a school for the blind
and that’s where it was introduced to
you talking computers and that really
opened up a whole new world of
opportunities I joined Microsoft ten
years ago as a software engineer I love
making things which improve people’s
lives and one of the things I’ve always
dreamt of since I was at university was
this idea of something that could tell
you at any moment what’s going on around
you I think it’s a man jumping in the
air doing a trick on a skateboard
I teamed up with like-minded engineers
to make an app which lets you know who
and what is around you it’s based on top
of the Microsoft intelligence api’s
which makes it so much easy to make this
kind of thing the app runs are on
smartphones but also on the pivothead
smart glasses when you’re talking to a
bigger group sometimes you can talk and
talk and there’s no response and you
think is everyone listening really well
or are they half asleep and II never
there I see two faces 40 year-old man
with a beard looking surprised 20
year-old woman looking happy via can
describe the general age and gender of
the people around me and what the
relations are which is incredible one of
the things that’s most useful about the
app is the ability to read our text okay
thank you I can use the app on my phone
to take a picture of the menu and it’s
gonna guide me on how to take that
correct photo move camera to the bottom
right and away from the document and
then they’ll recognize the text read me
the headings I see a appetizers salads
paninis
pizzas pastures years ago this was
science fiction I never thought it would
be something that you could actually do
but artificial intelligence is improving
at an ever-faster rate and I’m really
excited to see where we can take
as engineers we’re always standing on
the shoulders of giants building on top
of what went before and in this case
we’ve taken years of research from
Microsoft research to pull this off I
think it’s a young girl throwing an
orange frisbee in the park for me it’s
about taking that far-off dream and
building it one step at a time I think
this is just the beginning how cool is
that I mean it just fascinates me that
he’s like an engineer himself riding
this and I wrote I sat next to a blind
PHP engineer for years it was much
faster than me coding and I was just
confused about this but it’s just it’s
so much insightful when you when you
actually do that kind of that kind of
attitude and I what I love about machine
learning is that people with
disabilities are these super humans to
test against if we if it works for them
then it works for us even better and
that’s a really really cool way of I’ve
been doing accessibility for years and
years and trying to make people
understand that disability is not the
end of things but it actually is an
opportunity for everybody and with
inclusive design ideas and this kind of
thinking we have a great opportunity to
make things understandable for everybody
who doesn’t for example see or isn’t
able to understand it or take a picture
of something if you’ve seen for example
Google Translate on a phone as well the
app you can just take a picture you can
you can turn your camera on and see a
street sign and it translates it life
for you from the camera and I mean how
friggin cool is that when you’re in
Russia and you don’t know what the name
of that street is you only have the
English name for example and these
things are all possible because we have
these massive amounts of data and what I
love about this example as well is that
it’s all open data it’s like he just
used the open api’s from Microsoft he
didn’t have any internal access that
gave him extra access or something
because we didn’t build those sadly
enough but he just built that for
himself with systems that are open and
we’re now gonna he’s not gonna release
it on iphone I think first and I think
this is pretty amazing when you compare
it to the other services that are out
there I was just at an accessibility
conference where there was a commercial
company showing the same thing oh we got
a we got glasses that can detect people
and tell you when they’re in the room
and it was 4,500 euro for those glasses
and that whole solution
can run on any smartphone right now
and you don’t need those extra classes
to get the same functionality and that’s
what I want you to think about when it
comes to this machine learning in images
stuff the api’s are out there there’s
just trillions of photos that we already
indexed for you so cross-reference your
own data and make your images more
accessible that way and that’s all I had
for now so thank you very much

Please follow and like us:

GOTO 2016 • Fixing the Image Problems of the Web using Machine Learning • Chris Heilmann

Be First to Comment

Leave a Reply Cancel reply