GOTO 2015 • Modern Fraud Prevention using Deep Learning • Phil Winder

hello everyone.welcome this is modern
fraud detection prevention using deep
learning that title was submitted quite
a long time ago so I’d say actually the
talk is probably bit more about machine
learning in general now we had a good
talk earlier on in the day introducing
some of the concepts we time it behind
learning and I’m hoping to sort of build
on them really so this talk is going to
be a bit more technical there’s there’s
no maths which you’ll be glad to hear
there’s there’s also no code I’ve tried
to you know explain myself using
diagrams and pictures wherever I can but
it is a much more technical talk so
hopefully you can get you your teeth
into it we’ve got the usual slides at
the front saying please rate and engage
so yeah my name’s Phil I’m with try fork
but we’re in try fot leads so we’re
quite distinct from the Danish
mothership yeah I actually am a software
engineer in my in my professional life
machine learning is a just a bit more of
a hobby I’m currently working on there
and Apache meat sauce framework for
elasticsearch yeah if you’d if you’d
like to talk more about any of the
subjects that I’m about to discuss then
please see me or I’ll see some of my
colleagues listed at the bottom there
I’m gonna skip the marketing slides
because you will not try fuck and we’re
split into three or four topics the the
final one the architectures that is is
more about how we would do this in
production how we would do this in real
life it’s it’s interesting but it’s not
really the core thing of my my talk so
I’m going to go through the first three
sections and if we have time we might do
the fourth but I’ll probably end up
speaking for too long and I’ll probably
drop that section I’m going to introduce
the the reasons why we want to provide
some new tools and techniques to apply
to fraud to try and make the case to the
business users as why you should pick up
on some of these ideas and start
to run with them I’m gonna then
introduce the topic of machine learning
and you’ve probably had quite a bit of
experience already but if you haven’t
that that’ll be the section that really
explains what’s going on and and why it
happens and I’ve also got quite a lot of
demos as well some of the demos are
quite simple and very general just to
explain the concepts but the rest of the
demos are all focused towards fraud
prevention focus towards finance and
specifically mortgages okay so let’s
crack on so in order to to do any of
this work we need to persuade some
people to give us some money and there’s
no better reason to get people to give
us some money if there’s other money at
risk in the UK we’ve got some UK
specific facts here in the UK financial
crime is defined as I can’t even read
that screen so I have to read from here
sorry fraud is an act of deception
intended for personal gain or to cause a
loss to another party so all of these
facts and figures it specific to the UK
but they’re they’re applicable to pretty
much every country in the world anybody
that’s trying to do wrong to do harm for
their own financial gain is considered
fraud we’ve got a UK mortgage fraud
listed there in 2014 a 1.2 million
properties bought and sold in the UK and
83 in every 10,000 of those applications
were fraudulent so that’s not quite 1%
0.8 3% and when we say when you say
fraud in that that aspect it’s not
necessarily people being like hugely
devious we’re going from the small scale
where somebody’s maybe telling a few
fibs about their employment history or
how much they earn all the way up to
huge huge you know international fraud
in 2013 there was a story of two guys
who had invented there a whole series of
companies that invented estate agents
that invented surveyors they’ve invented
property businesses and builders and
they had supposedly bought a huge tract
of land which they were going to build
you know lots of new houses on the
invented or stole the identities of
other people to take out mortgages on
those respective houses so it turns out
there were tens you know tens to
hundreds of mortgage applications all
going in for houses that hadn’t been
built yet
but as it turned out they just took that
money paid off the original land the
original debt they worked they owned the
land and then just liked it they just
ran off they completely invented a
village bought loads and mortgages based
upon that and then ran off how how can
that even so that the total cost finally
came to it was about 53 million pounds
and managed to to run away with and they
did finally get caught but they very
nearly got away with it because it was
just so embarrassing
you know the mortgage company was so
embarrassed to say that this had
happened it almost never even got caught
so it does it does get to quite a large
scale and this this actually equates to
approximately 1 billion pounds worth of
fraudulent applications so it’s a huge
huge number but my interestingly it’s
not actually the worst case of fraud in
the UK the worst is actually credit
current account fraud so traditionally
what would what people would do is to
steal somebody’s information open a
standard bank account current account
some sort from from a traditional bank
which you can do quite easily in the UK
and then use the overdraft or use some
facilities to actually withdraw some
money and then and then run a runoff
so that actually constitutes the most
fraud in the UK but we’re talking a
little bit about mortgages today and
finally we’ve got UK real retail fraud
much of the business in the UK is
actually made up of small to medium
sized enterprise
the big guys actually don’t they make a
significant part of the market but not
not a huge part small to medium-sized
businesses they’re estimated losing
eighteen billion pounds every year to
fraudulent transactions so that’s when
somebody goes online buy some clothes or
buy some food or buy some shopping of
some kind on a credit card and then
maybe they cancel a credit card as soon
as I place the order so the the guys on
the retail side of having to ship all of
this stuff only to find that the person
you know doesn’t exist or
card stolen or stuff like that and that
amounts to a huge amount as well another
reason why businesses might want to look
at some of these ideas is that
legislation so we’ve got one end of the
spectrum where there’s people actually
doing wrong to their businesses you
might want to try and protect yourself
but also this legislation legal
requirements that need to be put in
place in order to comply two more in
2017 there’s new anti money laundering
legislation coming in within the EU so
it applies to all EU countries it’s
extending extending money laundering
rules that are already in place but the
main changes are that the out of scope
limit has dropped to a thousand euros
the previously it was fifteen thousand
euros and this applies to businesses
that are handling financial transactions
so it applies to banks obviously
financial institutions credit agencies
stuff like that it also applies to legal
services in the state services it also
applies to to gambling services
basically anybody that’s handling and
moving money around has to comply with
this legislation and what this is saying
is that anybody that has a transaction
of over a thousand euros they need to
prove to the authorities that they’re
doing their due diligence in to prove
that that person is a not being
fraudulent and be not using the money
for nefarious means like terrorism or
something like that and finally then
they’re they’re required to submit their
information to a central registry of
information and this there’s a well
there’s obviously previously concerns
there but that’s a bit unclear and how
that’s actually going to be implemented
so there’s financial reasons direct
financial reasons why you want might you
might want to do it is also legal
reasons so how do we do it at the moment
well if a traditional company was goat
would go to a software house and ask for
some software to do this they would
probably come up with these some
combination of these four general ideas
we’ve got the origination based
technique so most countries have a law
that requires financial services to
prove the
they’re talking to the real person
origination is that’s it that’s what
origination is I won one thing I get
really really annoyed about is banks in
the UK they’ve got this awful technique
of using automated phone systems to try
and prove you are who you say you are so
you go through the whole series of you
know police typing your ID number please
type in your address please type in your
password please do this please do that
and that takes about three and a half
minutes and then as soon as you finally
speak to a real person which is all you
wanted to do in the first place
as soon as you’ve written speak to a
real person they ask all the same
questions again and it turns out they do
this because these businesses aren’t
quite sure that the automated method
really is proof enough that the personal
methods are actually going through a
variety does my head in and some some
may be less secure instances such as
insurance agencies and people that are
not necessarily as interested in
protecting security they can use some
really quite dodgy methods like I’ve had
some cases where people have asked me
just for my date of birth or just for my
postcode or something like that and
they’re completely not secure your date
of birth is basically a password you
were given at birth you can’t change
it’s fixed and you have to live with it
so it’s the worst password that that can
ever exist the next group of
technologies are rules based so these
are static rules that are usually
provided by analysts saying that you
know no transaction must be bigger than
X or you can’t have so many transactions
within a certain period of time
something like that and they’re and
they’re great and they’re okay and they
catch a reasonable amount of fraud it’s
usually the the accidental types and the
basically the not so intelligent
fraudsters would try and do something
silly like this but also it u also catch
all the good guys as well like like when
you’re abroad you cards always declined
the first time because they think it’s
fraudulent or you know any trying to buy
a new car from a guy and he you know
takes cash and you try and pull out 1500
pounds out of the cash machine you can’t
do it because it’s you know it’s against
their static rules credit checks
lots of agencies will gladly accept your
money to provide you with a number
that’s it and these numbers are supposed
to represent the worthiness or the the
risk that that person provides to your
business and there is certainly a case
there’s an argument to use them how
accurate they are is another question
aggregation and monitoring so this is
more of a reactive type of solution
where analysts would be provided with
the data and they you know perform some
query or ask a question and try and do
something based upon that so for example
you can have some guys that find a
pattern between you know one cash
machine for example gave up a large
amount of money so the analyst will when
they check it out so they’re the types
of things that exist in the wild at the
moment but now I’m going to start
talking about machine learning and how
we can use machine learning to improve
some of those technologies and try and
remove some of the bias or the
redundancy or the error out of those
technologies okay so following on from
our excellent presentation this morning
I forgot the first name miss Pitt sorry
if you hear she was talking about how we
learn I I also have a couple of slides
but it’s not it’s it’s a bit more basic
I like to introduce my my daughter here
she’s she’s 18 months old and she’s
currently going through this process of
learning and it’s really fascinating to
watch how she does this because there’s
there’s lots of parallels between this
and between the state machine learning
algorithms at the moment and if we can
understand how how we learn it actually
helps us to write better algorithms and
it helps you to understand the
algorithms as well so this is my
daughter with her her mother my wife
making some yummy rice crispy crispy
chocolate square things and in the top
picture there she’s doing exactly what
mom told her please take the rice
krispies and put them in some baskets
and then we can eat them later on but
somewhere along the line she decided to
perform some tests
she decided if I put this thing in my
mouth
it gonna be good or is he gonna be bad
so she put it in her mouth and he was
good
so she completely ignored any
instructions from there not because
she’d learned that eating chocolate with
Rice Krispies was a good thing so that’s
a very simple example of how children
learn and how algorithms learn in
general you you provide them with some
tests with some input and then they
evaluate that input and decide on some
outcome
it takes time however Shoei she’s 18
months and she’s still pretty stupid you
know she can’t work she’s struggling to
put sentences together she she can when
she walks she falls flat in her face she
gets spatulas and misses a mouth and
hits her eye and it’s too late it’s not
good so it does take time for this to
happen this applies to to algorithms as
well it take time to learn we’ve got
this great game that she loves which are
index cards and this is an example of
how she gets things wrong I mean she’s
she’s very good I yeah she’s really good
I don’t give you the impression that I’m
a bad father I’m saying she’s rubbish
and get rid of it but no she’s very good
but in some cases she does get it wrong
the first example on the left there is a
door however she thinks it’s a house and
she thinks it’s a house because it’s got
four walls and it’s got these features
in the middle which are like squares
which kind of look like windows but what
she hasn’t learned yet is that a house
actually needs a triangle on the top and
so this is a this is an example of a
misuse of features so there are features
there but she’s misusing them to come to
the wrong conclusion the second one she
calls this a chicken because she doesn’t
quite understand the concept of a bird I
think she she struggles to to to
understand classes of things she’s quite
happy to learn that that thing is
definitely a bird and that thing is
definitely a teddy and that thing is
definitely mommy and that thing is his
dad went it around but she struggles
with things so that’s a chicken so
that’s so that’s okay but that’s just an
example of a Mis classification and then
finally we’ve got the third picture and
apparently that’s a tiger now I went out
when I show this cat she kind of looks
at me and goes I’m not sure what it is
and then I look at the car go
I’m not sure that is either idea I think
sometimes she goes for a cat sometimes
she goes for
there sometimes I don’t know I don’t
even know what it is it looks like
something sort of ran over it it’s like
a cat that’s been ran over basically and
that’s a great example of just bad data
so in real life you will get that data
and there’s a big cleaning method that’s
required to try and prevent you from
getting this bad data because you will
come to the wrong result so just to
prove that it’s not just her age I’ve
got an example for all of you so take a
look at this picture and I’m just going
to watch you for a second right so so
for all the programmers out there this
is like a human equivalent of like a
stack overflow so what you start doing
is you try and focus in on their eyes
but then you realize that she’s got eyes
in a different place so you kind of jump
across and then you realize the mouth is
in the wrong place so you jump again and
you’re up and down and up and down and
if you stare at it long enough you start
to feel sick so to that and but but all
this is proving is that you’ve learnt
some specific things over time you have
you know decade’s worth of experience to
say what a face which should look like
and when it doesn’t look like that you
don’t quite know how to process it and
we can get it wrong no humans are
completely infallible fallible sorry
they’re wrong choice of words they’re
completely fallible ok so moving on to
the more technical topics here machine
learning comprises a four-ish sort of
distinct components they’re all trying
to do slightly separate different things
the first item is dimensionality
reduction so when we think of data it
has a number of dimensions and by
dimensions are basically mean like a
single point of information so if you
imagine a 10 by 10 grayscale picture
that has like a hundred dimensions a
hundred pixels in there which all
represent a distinct piece of data the
problem with that is that with images
it’s ok but for many other types of data
it’s really hard to try and visualize
what’s going on so you’ve got to
compress that space down into two or
three dimensions in order to actually
see what’s going on so that’s the act of
dimensionality reduction we’ve got
clustering where we’re trying to assign
an output to a certain class
quite often we know what class it should
belong to or at least we should know how
many classes there are at least so
clustering is the process of trying to
group things together into distinct
classes we’ve got classification which
is linked to clustering where that’s
more asking the question exactly where
do I put the line to say that’s Class A
and that’s Class B and finally
regression which is trying to predict a
value based upon their previous inputs
we’ve also got different types of
learning as well learning is the key
thing that’s this really enabled deep
learning to to come to the forefront is
that the new training techniques that
have been developed are so much more
powerful than they were in the past
training can be split into supervised
and unsupervised learning supervised
learning is where you have an expected
result so it’s a it’s labeled so you say
that this raw data is supposed to belong
to Class A this is supposed to be the
number one or this person is fraudulent
the algorithm is then trained the
parameters of the algorithm and then
tuned to try and produce that same
result and the the measure of
performance for that algorithm is
compared to the true result versus the
predicted Frizzle and then when you were
to use this in in real life if you had
new data coming in then you would use
those pre learnt weights and you would
predict an output based upon that for
unsupervised
you’ve got no results so you don’t know
exactly what class it’s supposed to
belong to algorithms are trained in you
need to decide on on what’s going to
provide you with a measure of how good
your algorithms be trained so some some
of them deciding whether data are close
or far away so since this measure of
distance between data the there’s also
may be other reasons why you want to do
it as well and you can provide your own
we’re talking about
customized or personalized customized
functions to actually cost whether your
output is going to be labeled as class 1
or class 2 if something is important but
in the real in the real world most data
is usually semi-supervised
you usually start off with some label
data and usually a lot more that is
unlabeled so you can kind of combine
these two things together to maybe you
can use the labeled stuff to start to
bring out some of the clusters and then
apply the unlabeled data to you know
really filling the pattern a bit more so
let’s talk about some specific
algorithms I’m going to talk about to
every every guy’s got his own favorite
algorithm this first one is called a
decision tree and there’s various
different types of decision tree but
we’re going to stick to the simple one
for now and they can be used for
classification and regression and the
idea is that they predict the target of
the target value of a class or a value
or something based upon some very simple
decision rules so is it less than 10 or
bigger than 10 is it is it labeled a or
labeled B the example we’ve got there on
the right is quite morbid actually this
is a decision tree that’s been learned
from the data provided from the Titanic
manifests and this is predicting whether
you’re going to survive if you were on
the Titanic or not so the first question
it asks is is the sex male so if it was
yes then it goes down to one side of the
tree on the Left if it was no it goes
down the right side of the tree so if
you were female you had a pretty good
chance of 0.73 so 73% chance of
surviving and that represents 36% of the
entire population inside the Titanic or
as if you were male and if you were
above 9.5 then you’ve got a fairly big
chance that you’re going to die
unfortunately 61% of all males of a 9.5
died and you can see that you can go
down the tree and you can make a
decision based upon these rules so the
idea of the algorithm is to train these
parameters these rules these decision
points to optimally make the right
decision
so it’s conceptually quite simple it can
handle categorical data which is great
because some algorithms can’t but it
well decision trees specifically can
ooph it quite badly but there are lots
of methods
to to use decision trees in a different
way to prevent the overfitting so don’t
worry about that too much and decision
trees are usually one of the simplest
and sometimes effective enough to solve
a problem the next algorithm and what’s
surrounded by lots of hype at the moment
is deep learning so deep learning is
it’s really good because you remember
those classes of types of algorithms at
the start there he actually does all of
them he does the dimensionality
reduction the classification the
regression and the clustering it could
do all of it it’s a holy grail of
algorithms no other algorithm can
actually do all the same things the idea
is that it’s actually trying to model
our learning process in our brain
basically it seems to model the neurons
and the synapses in your brain to do the
similar sort of tasks it’s it’s
simplified somewhat but that’s that’s
the general idea so the hope here is
that if we can produce a model that of
our brain that then we can merit right
algorithms to perform things that our
brain can do quite easily like
recognition classification things like
that so the pros and cons again it’s
very versatile can be used for lots of
different tasks
the key improvement really is that it
begins to remove the requirement of
feature engineering so with all of the
other algorithms your algorithm will
live or die based upon what features you
give the input you need to work really
hard with other algorithms to to say
that this is the most important feature
I’m going to keep that and use that but
those are the ones are completely
redundant I’m going to remove them and
that takes a significant amount of time
with deep learning it has the ability of
internally during the training stage of
either completely removing parameters or
completely keeping parameters purely
based upon how well it fits the data how
well the training process goes so it
removes the bias that comes from
removing data or adding data that you’re
not sure it should be there or not the
the main con actually there’s a suppose
there’s a couple of cons the biggest one
is it can be hard to visualize as soon
as you start getting into
neural network sizes that are quite deep
it can be quite hard to visualize and
conceptualize I’m hopefully going to try
and prove that wrong in a little bit but
um that’s that’s the problem number one
and problem number two can be quite
computationally expensive but that’s
that’s true for kind of lots of these
algorithms really so how do they
actually work well they all it works
primarily by trying to conceptualize
things so there’s this idea that that
neural networks are acting like a
hierarchy of of concepts and the the
whole goal really is to take those
images also take your data and produce a
concept something that accurately
describes what is provided at the input
so we’ve got the couple of the concepts
on the left there we’ve got a street an
animal and a person but you can see that
you don’t
the to the bottom ones the person and
the animal there they’re actually linked
by another concept you know they’re both
animals is just one of them’s human so
the great thing about the delayering
concept is that you can actually start
to tag things that are similar but not
quite the same based upon your training
data so to be more specific this says is
a an example of how you would go about
conceptualizing an image so each pixel
within the image that’s the dashed lines
there that would be passed into the
input of our deep learning and it would
start to reduce concepts around those
pixels so the first layer might decide
that there’s a you know part of a tire
or a pile of a rim or an end plate or
something like that usually very small
discreet kind of local things within the
image the next layer might start to
build in that concept and build a
concept of a tire or a full wing or a
real wing and then finally we get to the
classification and in this case is an f1
car but you can imagine that if you then
showed the algorithm a normal car it
could reuse some of those concepts they
all they still have wheels they still
have you know cockpits or our bodies
away probably don’t have wings I don’t
know maybe maybe in Leeds I don’t don’t
about Denmark
but you can reuse some of these concepts
and that kind of shows the applicability
to not just not just problems that it’s
already seen but also future problems
that it hasn’t seen and so just to
finish this section off really just
machine learning in the news or deep
learning in them in the news the the one
I really like that’s accessible to
anybody really is the Google the new
Google Translate app that takes pictures
of signs or text in a different language
and it translates that text but the real
the cool USP of the whole thing is that
it actually takes the image and replaces
the image with the correct text in your
language so here we’ve got a Russian
sign and it’s replaced it with the
English here actually I say he says
access the city but according to my
friend who who speaks Russian it
actually means exit to village so not
access to City exit to village but it’s
not quite as grandiose if we showed if
Google showed us science and exit to
village so it’s probably why they
changed it and then we’ve got the the
images at the bottom and this is a new
chip developed by IBM it’s been a few
years in the making actually but
effectively it’s a a deep learning
neural network type infrastructure
inside a chip so obviously you’ve got
the cause and you used to the cause
imagine the cause parallelized massively
so instead of having you know one call
we’ve got tens of thousands in this case
is actually a million there’s a a
million neurons in this chip so it’s
able to do a million parallel tasks all
at the same time and when we go through
some of the examples in in a minute
we’re going to be talking about like
image sizes like they’re 10 10 by 10 100
input pixels that go down to maybe 2 to
2 outputs on there 2 dimensions on the
output so that’s kind of nothing in
comparison to what this could do and
this is actually in hardware as well so
it’s super fast super low power and
should produce some really interesting
applications ok so it’s just to solidify
the howdy learning works I’m going to
take you through an example which is a
description
of some some numbers here so the the
idea of this task is to recognize some
handwritten digits and to classify them
as a number from 0 to 9 so it’s a really
classic here machine learning example
but it’s really great to use in the
example as an example because it’s very
easy to understand very very easy for
everybody to understand it’s just trying
to recognize what that number is and the
first thing we notice when we start
looking at the data so the first step in
any in any data analysis job is to have
a look at the data and the first thing
we notice is that if you actually if you
look at that that top left number there
so I’m not not completely sure whether
that’s a 5 or a that’s 3 and this
immediately brings problems because this
data is actually labeled so every one of
these examples you’ll see so each each
number is an example you can see that
it’s been inverted from maybe you’ve
somebody written pen on white paper and
it’s being inverted and then reduced to
a fixed pixel size and then sent it as
well and the first thing that we can see
is we’re already not sure whether that’s
a 3 or a 5 and so somebody’s gone
through and labeled this data as being a
3 or a 5 but I’m not convinced that
that’s actually correct so we’re giving
our algorithm potentially dodgy data
already so there are in mind whenever
you’re trying to train data that your
your label data might not be right in
the first place because it’s usually
it’s usually labeled by by humans so
what we then do with each example is we
feed it into an input layer so I’m
trying to stay away from the term neural
network although I’ve mentioned it a
couple of times because that it’s been
around since the 80s but it it sounds
complicated but it’s really not all the
neural network is you have a node where
some data goes in and then you have have
links to an annexe subset of nodes and
those are those links all have weights
that it’s as simple as that all we do is
we alter the weights within the the
network in order to perform a task so
I’ll try and refrain from using that
terminology so our input layer is
usually the same size as the size of the
data so here we’ve got made maybe 10 by
10 pixels so we’ve got 100 inputs
have one input for each pixel we then
pass that data through to what’s known
as a hidden layer and we call it hidden
layer a bit basically because it’s not
an input or an output it’s something in
the middle it’s not directly observable
and the way in which they’re connected
is with a weight and during the training
process those weights could be you know
completely removed by setting it to zero
or you know completely kept by sitting
it’s all one and that’s all the training
process is doing so what’s really great
at this point is that those weights
actually they combine in the next layer
so you might have learnt that the
weights that have been learned for that
one particular neuron in the hidden
layer can actually be treated as like a
feature this is this is the beginnings
of a concept so it’s saying that given
that one neuron that one item in the
hidden layer there that has that has
certain weights on each of the input
pixels so if we if if we were to make
that the output layer there we could
imagine that if that was the the output
layer for the number one the weights
would represent a shape that looks
something like the number one generally
in hidden layers you have multiple
hidden layers so you’re trying to get
the algorithm to learn these small steps
these small increments of of concept and
what we can actually do is to say that
for for that one hidden layer we can go
back and say what does the input layer
have to look like in order to fully
activate that one neuron and only that
one neuron so this is an example of that
hidden feature layer here and it might
look a bit abstract but you you can just
about start to make out that it’s
starting to learn this kind of ghostly
images of numbers in there and that’s
because it’s starting to learn some of
these concepts if you were to use a
number of hidden layers and say you know
don’t don’t try and learn the number all
in one go it might come up with features
that are like edges maybe it could learn
the edge of the stick of a7 or maybe you
can start to learn some curves of a nine
or something like that and these are the
hidden features that are in the middle
of all these these networks
so then finally we would produce an
output layer which usually amounts to
the number of possible classifications
that we want to make so for our output
layer we would have 10 we would have 0
to 9 and each one of those nodes would
represent a number and at the output
layer if we were to actually put one of
these examples in you’d never get 100%
you always get this the we’re talking
earlier about how they’re they’re not
deterministic but you kind of they are
deterministic in the sense that they
have fixed weight so you can follow the
path of those weights through the data
however we’re never quite sure like
going back to that previous example
we’re never quite sure whether it’s a 5
or a 3 so we’re going to the algorithm
will probably decide that I’m 50 percent
sure that it’s a 5 but there’s a 40%
chance there could be a 3 so all of the
numbers that are generated basically the
the classification is made by picking
the highest of those numbers so in this
case would say that the 5 is the
classification for this example because
that add the highest value at the output
but what’s really cool as well is that
we can actually rather than try and tell
it to classify the objects by only
having 10 outputs we can actually
produce the same number of outputs and
inputs and say ask the algorithm please
try and reconstruct the image based upon
your hidden you know concepts and
representations so what we can do here
is given a certain output please reduce
reproduce that input and then we could
do some comparison to see how well it’s
performed so this is an example of what
a reconstruction actually looks like and
if I just flick backwards or forwards
between what was real what was the real
input and what was the learned concepts
about that you can kind of see that the
learned concepts are kind of like a
drunk blurred version of the real number
and that’s because they’re kind of
learning they did what the most likely
look is for that particular number and
and what’s really interesting is in the
real data with what we won’t show
whether that’s 3 or 5 but if you look at
the drunk verse
it actually looks a little bit more than
a five and this is saying that the
algorithm was decided um well but it’s
probably been labeled as a five so that
so the algorithm has has learnt that of
those features as a five so when you try
and reconstruct it it looks more like a
five and then finally we talked about
dimensionality reduction so what we can
do is take that high dimensional output
so in this case we have ten discrete
classes from zero to nine and we can
flatten them into space so we don’t have
ten dimensions to plot all our data so
we can’t we can’t plot the 50% of the
five to thirty percent of the for the
twenty percent of the three and so on
and so on all on a graph because we
don’t have that many dimensions so what
we can do is flatten all of that into
two dimensions and this is what this
process is here and what it shows you is
how well the data are clustering
together so we can see if I have stand
very close to my screen I can see that
the number Seven’s at the bottom are
quite well clustered there the number of
eights are okay in the top left but then
we’ve also got some very strange
features like so let’s take the five and
a three example you see the fives in the
orange in the middle they’re pretty well
mixed with the three and that’s kind of
because there must be quite a lot of
examples that look like a five or look
like a three so they’re quite well mixed
so that means to actually perform the
classification the algorithm is gonna
have to work really hard to try and you
know pull those apart so this is what
you would generally do on the output is
you would you would try and visualize
the data in such a way that we as humans
can couldn’t understand it that could be
in 2d or in 3d okay so hopefully that
that section kind of introduced you to
two deep learning and some of the ideas
and some of the terminology so when I
come to some of the financial demos
there this should be much easier to
understand so first example is a
traditional example using a rules-based
approach and in this case we’ve been a
little bit fancy we use in graph
database typically graphed over it
databases aren’t used as much as we’d
like but they do perform really well in
a
in a fraud based scenario so just
quickly recap if you don’t know a graph
database is a another new SQL database
but its power really is the description
of the data so the data can only ever be
either a node or a relationship a node
is like a thing or a noun whereas a
relationship is is a link or a
relationship or a or a verb that
basically connects two concepts together
and the key selling point really is that
sometimes you’ve got data that is just
better described in a graph like
structure so for example when we’re
talking about fraud and and finance and
stuff
you’ve got the concepts of people and
accounts and those people and accounts
are all linked to different things
they’re linked to an address a link to a
current account and so on so for example
we’ve got the traditional the
traditional social media use case where
we’ve got bobs these Bobby’s friends
with Jane we’ve got a chair contained
within a room Jane bought a book and so
on but the real power is that once
you’ve modeled it in this way you can
perform complex queries that you
wouldn’t be able to do in a traditional
relational database so when you wanted
to do so to go back to the social media
example again when you wanted to do like
who is friends with my friend you have
to do some crazy joined with your SQL in
order to get that to work with a graph
database you can just pop you can just
hop through the graph it makes it really
really fast so in their fraud situation
we might model our data to something
like this we might have an account
holder in the middle and they have
relationships with phone numbers or
national insurance numbers things like
that and then we can perform queries on
that if we would like to but when you
start viewing that in detail and
actually viewing how these connections
are connecting things together
interesting patterns start to come out
and especially if you’re visualizing it
in this way as well it’s much easier to
visualize data in this way than it is in
a table for example so in this example
we’ve got three account holders in red
having the red yep they’re red and
they’re linked in various different ways
we’ve got all three of them are sharing
the same address so who could be dodgy I
actually had a person in another talk
excuse me
that III was suggesting that all three
people sharing the same address that
could be dodgy and and she was like no
no no no when thousands of people are
sharing the same address then it’s dodgy
three is fine don’t worry about it so
I’m like okay so but we could set up a
rule there to say you know how many
people are using the same address and
you could do that in the traditional
database but where the power really
comes in is when you start linking these
these things together and searching for
these larger rings and groups within the
data so if we imagine that directly two
people aren’t sharing the same national
insurance number for example which is
illegal in the UK maybe there’s a third
party which is linking these National
Insurance numbers together so you
actually start to form these rings
within the data which are kind of not
not natural this shouldn’t really be
rings in the data and graph databases
are really good at viewing and spotting
these rings so that’s the kind of
technology that would exist in the wild
today if we were asked to to perform a
job like this but where we’re really
interested in is bringing some machine
learning techniques to some of these
ideas so the first idea I had was quite
a typical one really and that’s why
that’s why I did it because it was quite
easy to do but basically if we could use
vocal fingerprints for origination it
would just solve just the the main
reasons really it would save the user a
significant amount of time the user
experience would would you know be huge
hugely improved not having to wait on
the phone for 20 minutes just because
some stupid automated system took you to
the wrong place so if we can use their
person’s voice as a form of
authentication origination then we’ll be
able to save time be able to save
machines and be able to save their the
power of people on the other end of the
phone so to do this what we’d have to do
is to record the customers voice
we then pre-process the data in some way
to clean it up and put it in a format
that’s that’s capable of being put into
an algorithm in this case we would trade
a deep learning model but it could be
any algorithm and then we’d store that
fingerprint for future verification in
the online scenario so once you’ve got
set up the user would come on you’d
rerecord his voice again maybe against
the preset phrase maybe against new
phrase and then you’d compare that
result of the fingerprint and that would
prove whether that person is you know
really who they say they are so this is
the pre-processing stage in action so
this is a bit of signal processing which
is converting the the time signature of
the the audio file into frequency into
the frequency domain so what you’re
seeing there is a plot of the frequency
components versus time so red is strong
and that green blue a color is is weak
so it’s saying that you know you can see
there the gaps in between the data
they’re a kind of where that paused to
say the words and I think if we’re if it
works yeah so this is some example data
that I used in my learning and this is
three examples of three people saying
the same phrase don’t ask me what that
phrase actually means I don’t know what
anything but anyway you can tell
yourself that those three voices sounded
sometimes a little bit different but in
that last example completely different
and what we’re trying to do is to to
make the deep learning think the same
okay so once we’ve put it into our deep
learning model we’ve done the training
and we’ve produced an output our output
in this case is between these three
different people so you could have three
outputs and then again we’ve compressed
that we’ve squashed that under the
screen into two dimensions and this is a
plot that shows how close all of those
voices were between so we’ve got a
couple of different points in there and
the the different colors there – Bob
Steve and Dave they correspond to the
three different examples the three
different people giving the example
sorry and each individual point is a
specific phrase that they said so we had
ten ten different phrases that they said
and you can see that all of these
examples are clustering together quite
well so if we then took another they’re
the same people but using a different
spoken example so not the same examples
how would that perform
new data so I think we go again so the
top line now in the results that was the
the raw result the raw output of those
three neurons throught for that file and
it’s saying that one of the new your
honors have 0.98
the 10.1 another 100.1 as well and
that’s saying that you know Bob
definitely pretty sure 19 percent sure
that that was definitely Bob
there you go 97 percent chance that was
Steve there 96 percent it was Dave so
that was that example quite a simple
example in sense that it only used a
very small data set but it’s you know
it’s instructive and it kind of points
towards things that we could do in the
future given much more data I mean like
every phone call we pick up these days
there’s always a we are recording your
voice for verification training purposes
so there must be huge vast databases of
people’s voices out there ok so next is
ample decision trees so this is an
example of decision tree that we showed
earlier on and this is predicting
mortgage default so amazingly two banks
– – sorry two mortgage providers in the
u.s. went bust as usual of course and
were bailed out by the US taxpayer so we
owned by the US government so Freddie
but Freddie Mac and Fannie Mae and as
part of their I don’t know as part of
their reprisal basically a slap on the
wrist the the government forced them to
release lots of their data to the public
and amazingly they they publicized a
whole data set of mortgage applications
and also historical accounts of what
happened to those mortgage applications
so you can say that they did told us
whether that person then defaulted in
the future so the task here is given
some given some oh dear I’m running over
time off to speed up given some data is
it possible to predict whether that
person’s going to default so the first
the first problem is the whole data
cleaning problem like we saw
the previous talk it’s the vast majority
of time to spend cleaning data
I’m gonna skip over that so if we were
to flatten all of the data that was
recovered into a an image before we put
it through the algorithm this is kind of
what it looks like it’s very
intermingled and mixed can’t quite
understand what’s going on so a decision
tree is is learning all of these rules
and based upon the outcome of those
rules is rather yes the person defaulted
no they didn’t default so we had
approximately 20,000 samples total 50-50
split a random forest classifier so it’s
a type of decision tree algorithm but is
better does not over fit as much only 11
input features so the main problem here
is I don’t actually think we’ve got
enough data to do a really good job but
we’ll see what we can do and the one
great thing about decision trees is that
actually gives you a measure of
importance for all of those variables so
here we’ve got the variables that were
inputted to the algorithm at the bottom
and it shows their respective importance
of those variables on there on the
left-hand side so you can see actually
the credit score is in second place so
I’m not sure that the credit reference
agencies would be too happy that you
know they could only explain 0.25 of the
data so 25% of the data could only be
explained by the credit score alone so
not not a great result for them and
actually the most important measure was
the HPI origination which was the house
price index origination for that local
area so this is saying that a person who
took out a mortgage in a very local area
it’s very dependent on the prices within
that area as to whether they’re going to
default or not and this is kind of a
typical really in the US you can see
like vast tracts of like places like
Detroit that you know as soon as some of
the jobs left everybody just lost their
jobs in the whole house price area then
crashed and then people couldn’t afford
to sell because they couldn’t sell it so
that’s kind of why that’s so important
interesting result and then final
example I’m having to move rather
quickly here because I’ve only got two
minutes left but is it possible to take
that data
and try and see whether there’s
something strange going on without in
the data so basically this is an
unlabeled example we’re not telling it
what to learn here so how do we do that
well there’s a deep learning technique
called an autoencoder which basically it
takes the inputs and it restricts the
number of hidden neurons to only a few
concepts he’s saying you’ve really got a
pick and choose what data you use and
generate some concepts that are really
quite strict and then we try and
reproduce the output again and we’re
comparing the output against the input
as a measure of how well we’ll have done
so basically those restrictions in the
middle maybe only two neurons you know
yes and no something like that is that
possible to reconstruct the data so we
can do that so there’s the same data as
before slightly it’s a different random
sample so it might look slightly
different we’ve got an input layer a
number of hidden layers that are
compressing the data down into smaller
and smaller neurons and then we’re
reconstructing again back to the input
layer and doing a comparison to see how
well we did but what we can do then is
plot in two or three D one of those
hidden layers to actually view those
concepts and what we’ve learnt and
finally this is the result of that
process and the left-hand side we’ve got
a 2d representation and you can start to
see there’s actually some structure
within that data so most generally you
can see that the people that defaulted
the each ruse on that graph or on the on
the left-hand side and the people that
didn’t default on the right-hand side
and within there if you look on the
right-hand side there’s a couple of
orange dots and that’s saying that the
vast majority of people in there didn’t
default but one or two people did now an
analyst might start to ask why so it
could be something quite innocent you
know maybe the person lost his
high-powered job went to prison
something like that but it’s kind of
indicative that something else is going
on and this is where the analyst would
come in and start investigating that
data so these are completely unlabeled
and the algorithm has absolutely no idea
what it means
and it still takes human to do some
analysis and to do some investigation to
figure out what has happened but these
kinds of tools lead the analysts in the
right direction as opposed to just
taking a random Sam
and then finally on the right hand side
we’ve got a 3d representation of the
same data and this is where it becomes
really really powerful you can imagine
like if you could get that graph and you
can like look into it and and move it
and turn it around and you can start to
see clusters in 3d space and that’s when
it starts to become immersive and given
enough time it takes it takes a certain
amount of time for any analyst to
analyze data but given enough time they
will be able to learn to see patterns
within that data which will help them to
investigate things that they haven’t
seen before and I think I better stop
there because I’ve completely run out of
time so thank you very much for
listening
you

Please follow and like us:

GOTO 2015 • Modern Fraud Prevention using Deep Learning • Phil Winder

Be First to Comment

Leave a Reply Cancel reply