Network Files in Python
The open() function enables us to have access to the files in the local file system only. When we wish to get access to the network files (through the network protocols ftp, http, etc.), we have to use something else. We have the urlopen() function.
urlopen(‘url’)
It’s in the urllib library. However, we should be careful, because to get a network resource, we do the following in Python 2.x.x:
var1 = urllib.urlopen(‘ftp://path_to_the_file/file.txt’)
But in Python 3.x.x, we have a different way:
var1 = urllib.request.urlopen(‘ftp://path_to_the_file/file.txt’)
>>> import urllib.request
>>> p = urllib.request.urlopen(‘http://www.wp.pl/’)
>>> print(p.read(5000))
b'<!DOCTYPE html>\n<html lang=”pl”>\n<head>\n\t<meta charset=”utf-8″ />\n\t<titl
e>Wirtualna Polska – http://www.wp.pl</title>\n\t<meta name=”Expires” content=”0″>\n\t<
meta http-equiv=”Pragma” content=”no-cache”>\n\t<meta http-equiv=”Cache-Control”
content=”no-cache”>\n\t<meta http-equiv=”X-XRDS-Location” content=”http://serwe
r.openid.wp.pl/?xrds=1″>\n\t<meta name=”author” content=”Wirtualna Polska” />\n\
t<meta name=”keywords” content=”wp.pl,wp,Wirtualna Polska,Wirtualna,Polska,Katal
og,Katalog WWW,Firmy,Encyklopedia,Pogoda,Wiadomosci,Program,Telewizja,Sklep,Kawi
arenka,MP3″ />\n\t<meta name=”description” content=”Pierwszy horyzontalny portal
internetowy w Polsce. Skuteczne medium reklamowe. Bogactwo serwisow informacyjn
ych i finansowych. Centrum wyszukiwania, komunikacji i rozrywki: wiadomosci, wys
zukiwarki, poczta, webpark, czat, komunikator, SMS, randki, kartki, krzy\xc5\xbc
…
So we have the first 5000 bytes of it (truncated). If it’s too much, we can read less:
>>> print(p.read(50))
b”==’016′)){PB(‘extra16′,PWAd);}if((PWAk==’017’)){PB”
>>> print(p.read(100))
b”(‘extra17′,PWAd);}if((PWAk==’018’)){PB(‘extra18′,PWAd);}if((PWAk==’019’)){PB(‘
extra19′,PWAd);}if((PW”
>>>
Or more.
It’s easy. You should remember that we can read data directly from the file on server; we aren’t forced to copy the file to the local file system. Using urllib, we can also open files in the local file system, but only read-only. We use the URL in this way:
file:///path_to_the_file/file.txt
For example, we will open the file on D:\:
>>> s = urllib.request.urlopen(“file:///D:/dereka.txt”)
>>> print(s.read())
b”
>>> s.close()
>>>
Both in Python 2.x.x and Python 3.x.x, urlopen() returns a bytes object because it’s not possible to determine the encoding of the byte stream it receives from the server. It will be decoded to string later when the program knows the returned bytes object encoding.
So we can use something like that:
>>> tyt = p.read(300)
>>> tyt
b”((PWAk==’028′)){PB(‘extra28′,PWAd);}if((PWAk==’029’)){PB(‘extra29’,PWAd);}if((
PWAk==’031′)){PB(‘logo’,PWAd);}if((PWAk==’034′)){PB(‘megabox’,PWAd);}if((PWAk==’
037′)){PB(‘megasky’,PWAd);}if((PWAk==’040′)){PB(‘extra40’,PWAd);}}catch(e){PWAje
l(‘jsfile_p-NPB’,e);}}function NJB(PWAk){return JB(PWAk);}func”
>>> ode = tyt.decode(“utf-8”)
>>> print(ode)
((PWAk==’028′)){PB(‘extra28′,PWAd);}if((PWAk==’029’)){PB(‘extra29’,PWAd);}if((PW
Ak==’031′)){PB(‘logo’,PWAd);}if((PWAk==’034′)){PB(‘megabox’,PWAd);}if((PWAk==’03
7′)){PB(‘megasky’,PWAd);}if((PWAk==’040′)){PB(‘extra40’,PWAd);}}catch(e){PWAjel(
‘jsfile_p-NPB’,e);}}function NJB(PWAk){return JB(PWAk);}func
>>>
Sometimes it can be useful. However, we should get more readable data for human, so we need to decode the bytes we got. It will be utf-8 in most cases. But we can check that like above. Knowing the encoding, we are able to decode our bytes to have a nice text. We can use that in this way:
>>> with urllib.request.urlopen(‘http://www.wp.pl’) as l:
… print(l.read(150).decode(‘utf-8’))
…
<!DOCTYPE html>
<html lang=”pl”>
<head>
<meta charset=”utf-8″ />
<title>Wirtualna Polska – http://www.wp.pl</title>
<meta name=”Expires” content=”0″>
<me
>>>
Or in that way:
>>> k = urllib.request.urlopen(‘http://www.wp.pl/’)
>>> print(k.read(150).decode(‘utf-8’))
<!DOCTYPE html>
<html lang=”pl”>
<head>
<meta charset=”utf-8″ />
<title>Wirtualna Polska – http://www.wp.pl</title>
<meta name=”Expires” content=”0″>
<me
>>>
We can choose what is better for us.
hello!,I really like your writing so a lot! share we communicate more about your
article on AOL? I require a specialist in this
area to solve my problem. May be that’s you! Taking a look ahead to look you.
Hello! AOL = America OnLine? What articles?
Intrygujący blog. Od dawna pracuję w tej dziedzinie. Na pewno wrócę tu znów. Wszystkiego dobrego w ciągłym tworzeniu. Pozdrawiam.
Wielkie dzięki — pozdrawiam.
Od nie pamiętam kiedy poszukiwałem felietonu o tym. Dopiero tu otrzymałem interesujące mnie wytłumaczenie. Z całego serca dziękuję. Życzę powodzenia w prowadzeniu bloga.