Press "Enter" to skip to content

Accessing the internet – Go Lang Practical Programming Tutorial p.10


what’s going on everybody welcome to
part 10 of the golang a tutorial series
in this part what we’re gonna be doing
is the working on the first step to our
news aggregator web app thing which is
in order to aggregate news we first need
to be able to access that news so to do
that we have to be able to read
information from the internet which is
actually a pretty common task so that’s
what we’re gonna be covering here is how
do we actually just pull down data from
the internet so that’s what we’re gonna
do I’m going to be pulling data from the
Washington Post sitemap index but you
could use any website you want to use
I’m just going to be using that one I’ll
put a link to that in the description
either to that or the tech space right
up which will have that link if I forget
to do that someone like holler at me
sorry what but you really can use like
any website so without going get started
I’m gonna go ahead and kind of clean up
I’m gonna leave the main function and
then we can leave format and also net
hgp because we’re gonna be using both of
those so now what we’re gonna go ahead
and do is everything pretty much for
this one at least we’ll we can contain
within the main function so I’m just
going to tap over and the first thing we
want to do is get information from the
internet so so generally what’s gonna
happen in we kind of talked about this
with like the web app right because
we’re kind of doing it you know we’re on
both sides of this equation here so if
we want to pull information from a
website we have to first make that
request and then we get a response and
in general that response is gonna have
you know like a whole bunch of
information but it’ll also have like you
know this stuff that your browser is
going to use to render the website to
your page so it’s gonna have like that
source code but generally that’s gonna
be in bytes so we’re gonna need to
convert that out to a string so we can
actually like use it how we would like
to and then from there at least in our
case we we’ve got quite a bit of parsing
and formatting that we need to be able
to do from there but at least for the
purpose of this tutorial we’re just
gonna try to pull down that source code
so let’s go ahead and do that so
um let’s see we’re gonna use format
we’re gonna use net HTTP and then we
also need to use the IO util so that’s
gonna be I owe so lash IO you tilt so
now what we’re gonna go ahead and do is
first we’re gonna make our request so
the request is going to return two
things basically it’s gonna be a
response and then it’s gonna be an error
if you’re gonna if you get an error so
it might be empty but for now we’re just
going to use the underscore so you’re
just gonna use the underscore any time
don’t forget colon equals anyway you’re
gonna use the underscore anytime you
define a variable that you don’t intend
to use if you don’t use the underscore
base that just says okay this is just
throw away something needs to be
unpacked through this but I don’t plan
to use it because and go if you define a
variable and then you don’t use it
you’re gonna get an error when you when
you run the script anyway HTTP GET
capital G again exported we’re gonna
read in the link I’m just gonna copy and
paste it but let’s see there so yeah
it’s just Washington Post com slash news
– site Matt – indexed XML just for the
record it’s a sitemap that contains
links to all the categorized sitemaps so
like politics and opinions and then like
tech and local and sports all that stuff
so yeah I’m definitely not no shout out
specifically to to Washington Post I
actually don’t really read Washington
Post I just like it because it’s got
kind of a sitemap that leads to yet
another sitemap and it poses an
interesting task for us there quite a
few websites but this is just one that
I’m going to use here so anyways and
also just in case especially for the
future tutorials this might change like
like historically when I’ve done
tutorials using any other website not
just for parsing but for anything like a
API or anything
they’ve always changed or at least
almost always changed so just be
prepared that this might not be the same
as when I’m covering it now
so anyways um we do a get request
basically that just means hey we want to
get some data from you as opposed to
like a post request hey we’re sending
you some data
anyway and that’s gonna end up giving us
basically the response but then but
within that response there’s going to be
the body so we’re gonna say the bytes
and then also we don’t care about the
next thing we’re just need to unpack it
: io util dot read all and we’re gonna
read all from the response dot body from
there we need to that’s gonna be in
bytes as the name suggests we need to
convert it to a string so we’re gonna
say string underscore body colon equals
string bytes and then we’re gonna do
response dot body dot capital C close so
we free up those the resources basically
for that made this made the request so
with that let’s go ahead and run it I
think I’m still running my web server so
I’ll break that and rerun let’s see if
we got error imported not you oh we
imported format but then we wind up not
using it eventually we are going to use
it but yeah see that’s kind of annoying
like I know I’m gonna use it I just I’m
up to this point my code I want to make
sure it runs and then I’ll use format
but don’t force me to do this this
straight body declare did not use okay
cuz what yeah alright um oh and here’s
why I wanted these format so fine
whatever format that print line stream
body and then let’s go ahead and import
format now alright that was really
enjoyable let’s run that one more time
hopefully that’s the end of our errors
alright and add the output at least in
my case is that the sitemap so that’s
kind of what we expected so from here
either you would parse things like you
could parse HTML and at some point we
might talk about parsing HTML in goes
specifically but this is XML so we’re
actually going to talk about parsing the
XML specifically and then eventually get
to the point where we can parse you know
just the URLs and then we can go visit
those URLs which are themselves sitemaps
grab some more information that way we
can aggregate news by term or whatever
so that’s it for this tutorial
pretty basic but but yeah that’s how
you’re gonna grab like the source code
of some information off the internet so
if you have any questions comments
concerns whatever feel free to leave
below otherwise I will see you in the
next tutorial
Please follow and like us:

Be First to Comment

Leave a Reply