Network Files in Python

Network Files in Python

The open() function enables us to have access to the files in the local file system only. When we wish to get access to the network files (through the network protocols ftp, http, etc.), we have to use something else. We have the urlopen() function.

urlopen(‘url’)

It’s in the urllib library. However, we should be careful, because to get a network resource, we do the following in Python 2.x.x:

var1 = urllib.urlopen(‘ftp://path_to_the_file/file.txt’)

But in Python 3.x.x, we have a different way:

var1 = urllib.request.urlopen(‘ftp://path_to_the_file/file.txt’)

>>> import urllib.request
>>> p = urllib.request.urlopen(‘http://www.wp.pl/’)
>>> print(p.read(5000))
b'<!DOCTYPE html>\n<html lang=”pl”>\n<head>\n\t<meta charset=”utf-8″ />\n\t<titl
e>Wirtualna Polska – http://www.wp.pl</title&gt;\n\t<meta name=”Expires” content=”0″>\n\t<
meta http-equiv=”Pragma” content=”no-cache”>\n\t<meta http-equiv=”Cache-Control”
 content=”no-cache”>\n\t<meta http-equiv=”X-XRDS-Location” content=”http://serwe
r.openid.wp.pl/?xrds=1″>\n\t<meta name=”author” content=”Wirtualna Polska” />\n\
t<meta name=”keywords” content=”wp.pl,wp,Wirtualna Polska,Wirtualna,Polska,Katal
og,Katalog WWW,Firmy,Encyklopedia,Pogoda,Wiadomosci,Program,Telewizja,Sklep,Kawi
arenka,MP3″ />\n\t<meta name=”description” content=”Pierwszy horyzontalny portal
 internetowy w Polsce. Skuteczne medium reklamowe. Bogactwo serwisow informacyjn
ych i finansowych. Centrum wyszukiwania, komunikacji i rozrywki: wiadomosci, wys
zukiwarki, poczta, webpark, czat, komunikator, SMS, randki, kartki, krzy\xc5\xbc

So we have the first 5000 bytes of it (truncated). If it’s too much, we can read less:

>>> print(p.read(50))
b”==’016′)){PB(‘extra16′,PWAd);}if((PWAk==’017’)){PB”
>>> print(p.read(100))
b”(‘extra17′,PWAd);}if((PWAk==’018’)){PB(‘extra18′,PWAd);}if((PWAk==’019’)){PB(‘
extra19′,PWAd);}if((PW”
>>>

Or more.

It’s easy. You should remember that we can read data directly from the file on server; we aren’t forced to copy the file to the local file system. Using urllib, we can also open files in the local file system, but only read-only. We use the URL in this way:

file:///path_to_the_file/file.txt

For example, we will open the file on D:\:

>>> s = urllib.request.urlopen(“file:///D:/dereka.txt”)
>>> print(s.read())
b”
>>> s.close()
>>>

Both in Python 2.x.x and Python 3.x.x, urlopen() returns a bytes object because it’s not possible to determine the encoding of the byte stream it receives from the server. It will be  decoded to string later when the program knows the returned bytes object encoding.

So we can use something like that:

>>> tyt = p.read(300)
>>> tyt
b”((PWAk==’028′)){PB(‘extra28′,PWAd);}if((PWAk==’029’)){PB(‘extra29’,PWAd);}if((
PWAk==’031′)){PB(‘logo’,PWAd);}if((PWAk==’034′)){PB(‘megabox’,PWAd);}if((PWAk==’
037′)){PB(‘megasky’,PWAd);}if((PWAk==’040′)){PB(‘extra40’,PWAd);}}catch(e){PWAje
l(‘jsfile_p-NPB’,e);}}function NJB(PWAk){return JB(PWAk);}func”
>>> ode = tyt.decode(“utf-8”)
>>> print(ode)
((PWAk==’028′)){PB(‘extra28′,PWAd);}if((PWAk==’029’)){PB(‘extra29’,PWAd);}if((PW
Ak==’031′)){PB(‘logo’,PWAd);}if((PWAk==’034′)){PB(‘megabox’,PWAd);}if((PWAk==’03
7′)){PB(‘megasky’,PWAd);}if((PWAk==’040′)){PB(‘extra40’,PWAd);}}catch(e){PWAjel(
‘jsfile_p-NPB’,e);}}function NJB(PWAk){return JB(PWAk);}func
>>>

Sometimes it can be useful. However, we should get more readable data for human, so we need to decode the bytes we got. It will be utf-8 in most cases. But we can check that like above. Knowing the encoding, we are able to decode our bytes to have a nice text. We can use that in this way:

>>> with urllib.request.urlopen(‘http://www.wp.pl&#8217;) as l:
…     print(l.read(150).decode(‘utf-8’))

<!DOCTYPE html>
<html lang=”pl”>
<head>
        <meta charset=”utf-8″ />
        <title>Wirtualna Polska – http://www.wp.pl</title&gt;
        <meta name=”Expires” content=”0″>
        <me
>>>

Or in that way:

>>> k = urllib.request.urlopen(‘http://www.wp.pl/&#8217;)
>>> print(k.read(150).decode(‘utf-8’))
<!DOCTYPE html>
<html lang=”pl”>
<head>
        <meta charset=”utf-8″ />
        <title>Wirtualna Polska – http://www.wp.pl</title&gt;
        <meta name=”Expires” content=”0″>
        <me
>>>

We can choose what is better for us.

5 thoughts on “Network Files in Python

  1. IAnia

    Intrygujący blog. Od dawna pracuję w tej dziedzinie. Na pewno wrócę tu znów. Wszystkiego dobrego w ciągłym tworzeniu. Pozdrawiam.

    Reply
  2. Lesku

    Od nie pamiętam kiedy poszukiwałem felietonu o tym. Dopiero tu otrzymałem interesujące mnie wytłumaczenie. Z całego serca dziękuję. Życzę powodzenia w prowadzeniu bloga.

    Reply

Leave a comment