Press "Enter" to skip to content

Parsing XML – Go Lang Practical Programming Tutorial p.11


what’s going on everybody welcome to
part 11 of the golang tutorial series in
this part what we’re gonna be doing is
learning how we can actually parse this
XML document so just in case for the
inevitability when the washington post
the sitemap index basically here when it
ever it happens to change or whatever
something else goes wrong because it
almost certainly will especially on a
long enough time line this is the
structure of it and then at the end I’ll
kind of show you guys how you can
convert that but just know that that’s
the structure and you can either use
your own sitemap index that you found
from sitemap rather that you find from
somewhere else or or you can you can
convert this basically so if you want to
if you want to be able to do that I’ll
put a link or something to the
text-based version in the tutorial so
you can still follow along even if for
whatever reason you can’t use the exact
same one that we’re using but yeah so
let’s go ahead and get started so the
way that we’re going to do this is we’re
going to use one more package and that’s
the encoding slash HTML or XML package
so that’s going to be encoding slash XML
and we’re gonna use that to unmarshal
into basically the structure that is at
the XML structure so it’s gonna we’re
gonna we could do this ourselves totally
from scratch without using encoding ten
XML or slash XML but that would be
really tedious whereas this is kind of
already built to accept it we just need
to kind of give it the structure of the
data that we’re trying to decode really
so let’s go ahead and get started so the
first thing that I’m gonna do is kind of
clean this up we’re not anymore gonna
use string body or print that line out
so now what we want to do is we need to
define the structure of this this XML
document so first I’m going to do a type
sitemap index that’s gonna be a struct
and then inside it basically at the end
of the day what do we want so we want
capital locations as the value and it’s
going to be an array of the location
type which doesn’t yet exist and then we
kind of
scribe this is for when we go to
unmarshal it the tag that it’s under so
that’s XML : and then double quotes
don’t forget those sitemap the other
thing you don’t want to forget that’s a
little less obvious in my opinion is
that you must capitalize these values if
you don’t capitalize these values they
won’t be exported when you go to use two
unmarshal it basically it’s gonna see
that that’s really supposed to be like
internal basically so it won’t export it
you won’t get any values from it and
that’s really annoying I got stuck on
that for way too long
that was annoying so anyways locations
of location type and then what’s
happening here basically it’s going to
be and in this case a slice in slices
basically let me just run through slices
really quickly and erase basically
anything that is you know square
brackets with the number in it and then
a type whatever that type happens to be
that is that’s an array anything that
doesn’t have a number in it and a type
that’s a slice they’re pretty much the
same thing the only difference is this
is of a fixed size you could also have
like a 5×5 for example that’s going to
be an array this is gonna be a slice so
for example 5×5 int that’s a you know
5×5 integer array whereas here this is
just some sort of integer slice of some
kind in our case here we’ve got
locations it’s a slice of location types
we don’t really know what those are yet
and so we need to define those so what
while we’re talking about it let’s go
ahead and do that type location struct
and here it’s gonna be the location
again don’t forget it must be capital l
OC string it’s gonna so that’s a string
type and then where’s it located that’s
gonna be XML under the Luke LOC tag
obviously that must be lowercase because
that’s the you know the tag itself is
lower cased okay now what we can do is
come down here and
and we can do bar s and bar s is going
to be a sitemap index type and now we
can unmarshal into that so we’re gonna
do XML dot capital u I’m Marshall and
then where do we want to or what do we
want to unmarshal that’s gonna be bytes
and then where do we want to well we’re
gonna unmarshal at the basically into
the memory address of s so now that
we’ve done that let’s go ahead and see
what we’re looking at so we should be
able to format dot print line s dot
locations because that’s gonna be our
basically our our slice of data so let’s
go ahead and save that and run it and
see how we’ve done go wrong go to okay
so what we get here is pretty much like
we expected and if you’re not you know
if you’re not new to programming your
pry some flags are going off but but
anyways here are all the URLs so we’re
very very close to what we wanted but it
looks odd like we can see the brackets
here which kind of denotes list or array
or something which is like yeah that’s
what we wanted but then we have like
these curly braces well the reason why
we have these curly braces is what we
have here is it’s still basically it’s
not a string yet like so
so the sitemap index so like of this
type yes it’s got a location slice and
yeah the location itself is a string but
we actually need to have a string method
that’s gonna apply to this so we’ve
actually already talked about methods
and all that so this is relatively
simple but in this case if you have a
string method what are we trying to do
are we trying to actually modify
anything within the the struct or are we
just trying to get some values out of it
well we’re just trying to get some
values out of it so in this case we can
use a value receiver so let’s go ahead
and func
and then we’re gonna do L for a location
type that was an underscore L that looks
kind of weird and sublime but anyways
and then it’s a string with a capital S
of string type which is what it’s going
to return and then it’s just going to
return
a format dot s print F L dot location
and save that and then mistaken let’s
just rerun it real quick right okay so
now that we’ve given it a string method
it actually has strings lo and behold we
actually have string URLs
also let me just pull up the s printf
here there you go anyway it basically
it’s just gonna format that it does what
it says formats according to a format
specifier and returns the resulting
string basically you’re gonna use that
pretty much every time you will have a
string method if you want to convert
some sort of struct thing to a string
this would be the way you’re going to
use it to be honest I’ve not really seen
any other reason you would use s Peart
if that’s the only time I’m sure there
are more I’ve not been in golang for a
really long time but that’s appears to
me to be the the main use okay so now
that we’ve made it that far we’ve got a
it’s a slice but I’m gonna probably call
it a list a few times but and that’s
what it looks like to me it’s a list of
stuff right but it’s definitely a slice
there’s no comma there so I guess we
could call it not a list anyway what we
need to do now is iterate through these
values and get those URLs and then visit
those URLs and because those are site
maps get the URLs and maybe titles or
something from those site maps and so on
so that’s what we’re doing the next
tutorial obviously we need to learn how
to actually loop over this list first
that’s what we’re gonna be talking about
loops next the other thing I want to
show you guys real quick for the end of
the tutorial
if you if for whatever reason you can’t
access the Washington Post site map and
you still kind of wanted to follow along
here is what you could do you could save
our wash post XML equals slice of bytes
bite and then it’s going to be
a multi-line and paste boom done
let’s go ahead and move that underneath
the import just to make it right
and then basically a byte so you could
say bytes each equals wash post I don’t
think it’s what we want to like wash
post XML yeah get rid of this unmarshal
bytes we’ve probably got some import
that we don’t need but let’s just run it
really quickly to find bytes and I you
util bytes equals that’s kind of Oh :
equals and then it was IO util that we
didn’t need so I can just remove that
real quick bring this back up run again
oh come on just please work this time I
have time for this I just wanted to show
you guys real quickly okay and then
that’s probably gonna get angry at us
for using HTTP there we go okay so just
obviously it’s short so I use the
shorter XML but that’s how you can just
still follow along if and when this the
sitemap goes away also it’d be kind of
nice because you can come in here and
you can maybe add to your new tags kind
of play around with it you know I
encourage you to try your own sitemap
index try to figure out how to build the
struck sand all that because that’s not
the most intuitive thing ever in my
opinion but anyway that’s all for now in
the next tutorial we’ll talk about
looping because we want to be able to
loop over that list if you have
questions comments concerns whatever
feel free to leave them below otherwise
I will see you in the next go tutorial
Please follow and like us:

Be First to Comment

Leave a Reply