urllib Module in Python

urllib Module in Python

It’s a very important module for people who work with frameworks and other WWW technologies. This module simply provides a unified client interface for HTTP, FTP, and gopher protocols. It automatically picks the right protocol handler, knowing the URL passed to the library.

>>> import urllib
>>> dir(urllib)
[‘__builtins__’, ‘__cached__’, ‘__doc__’, ‘__file__’, ‘__initializing__’, ‘__loa
der__’, ‘__name__’, ‘__package__’, ‘__path__’, ‘error’, ‘parse’, ‘request’, ‘res
ponse’]

The urllib module in Python 3.x.x is divided into four submodules:

urrlib.request

urrlib.parse

urllib.error

urrlib.response

So we should examine them in detail.

urllib.request
>>> dir(urllib.request)
[‘AbstractBasicAuthHandler’, ‘AbstractDigestAuthHandler’, ‘AbstractHTTPHandler’,
 ‘BaseHandler’, ‘CacheFTPHandler’, ‘ContentTooShortError’, ‘FTPHandler’, ‘FancyU
RLopener’, ‘FileHandler’, ‘HTTPBasicAuthHandler’, ‘HTTPCookieProcessor’, ‘HTTPDe
faultErrorHandler’, ‘HTTPDigestAuthHandler’, ‘HTTPError’, ‘HTTPErrorProcessor’,
‘HTTPHandler’, ‘HTTPPasswordMgr’, ‘HTTPPasswordMgrWithDefaultRealm’, ‘HTTPRedire
ctHandler’, ‘HTTPSHandler’, ‘MAXFTPCACHE’, ‘OpenerDirector’, ‘ProxyBasicAuthHand
ler’, ‘ProxyDigestAuthHandler’, ‘ProxyHandler’, ‘Request’, ‘URLError’, ‘URLopene
r’, ‘UnknownHandler’, ‘__all__’, ‘__builtins__’, ‘__cached__’, ‘__doc__’, ‘__fil
e__’, ‘__initializing__’, ‘__loader__’, ‘__name__’, ‘__package__’, ‘__version__’
, ‘_cut_port_re’, ‘_ftperrors’, ‘_have_ssl’, ‘_localhost’, ‘_noheaders’, ‘_opene
r’, ‘_parse_proxy’, ‘_proxy_bypass_macosx_sysconf’, ‘_randombytes’, ‘_safe_getho
stbyname’, ‘_thishost’, ‘_url_tempfiles’, ‘addclosehook’, ‘addinfourl’, ‘base64’
, ‘bisect’, ‘build_opener’, ‘collections’, ‘contextlib’, ’email’, ‘ftpcache’, ‘f
tperrors’, ‘ftpwrapper’, ‘getproxies’, ‘getproxies_environment’, ‘getproxies_reg
istry’, ‘hashlib’, ‘http’, ‘install_opener’, ‘io’, ‘localhost’, ‘noheaders’, ‘os
‘, ‘parse_http_list’, ‘parse_keqv_list’, ‘pathname2url’, ‘posixpath’, ‘proxy_byp
ass’, ‘proxy_bypass_environment’, ‘proxy_bypass_registry’, ‘quote’, ‘re’, ‘reque
st_host’, ‘socket’, ‘splitattr’, ‘splithost’, ‘splitpasswd’, ‘splitport’, ‘split
query’, ‘splittag’, ‘splittype’, ‘splituser’, ‘splitvalue’, ‘ssl’, ‘sys’, ‘tempf
ile’, ‘thishost’, ‘time’, ‘to_bytes’, ‘unquote’, ‘unwrap’, ‘url2pathname’, ‘urlc
leanup’, ‘urljoin’, ‘urlopen’, ‘urlparse’, ‘urlretrieve’, ‘urlsplit’, ‘urlunpars
e’, ‘warnings’]

urllib.parse
>>> dir(urllib.parse)
[‘DefragResult’, ‘DefragResultBytes’, ‘MAX_CACHE_SIZE’, ‘ParseResult’, ‘ParseRes
ultBytes’, ‘Quoter’, ‘ResultBase’, ‘SplitResult’, ‘SplitResultBytes’, ‘_ALWAYS_S
AFE’, ‘_ALWAYS_SAFE_BYTES’, ‘_DefragResultBase’, ‘_NetlocResultMixinBase’, ‘_Net
locResultMixinBytes’, ‘_NetlocResultMixinStr’, ‘_ParseResultBase’, ‘_ResultMixin
Bytes’, ‘_ResultMixinStr’, ‘_SplitResultBase’, ‘__all__’, ‘__builtins__’, ‘__cac
hed__’, ‘__doc__’, ‘__file__’, ‘__initializing__’, ‘__loader__’, ‘__name__’, ‘__
package__’, ‘_coerce_args’, ‘_decode_args’, ‘_encode_result’, ‘_hostprog’, ‘_imp
licit_encoding’, ‘_implicit_errors’, ‘_noop’, ‘_nportprog’, ‘_parse_cache’, ‘_pa
sswdprog’, ‘_portprog’, ‘_queryprog’, ‘_safe_quoters’, ‘_splitnetloc’, ‘_splitpa
rams’, ‘_tagprog’, ‘_typeprog’, ‘_userprog’, ‘_valueprog’, ‘clear_cache’, ‘colle
ctions’, ‘namedtuple’, ‘non_hierarchical’, ‘parse_qs’, ‘parse_qsl’, ‘quote’, ‘qu
ote_from_bytes’, ‘quote_plus’, ‘scheme_chars’, ‘splitattr’, ‘splithost’, ‘splitn
port’, ‘splitpasswd’, ‘splitport’, ‘splitquery’, ‘splittag’, ‘splittype’, ‘split
user’, ‘splitvalue’, ‘sys’, ‘to_bytes’, ‘unquote’, ‘unquote_plus’, ‘unquote_to_b
ytes’, ‘unwrap’, ‘urldefrag’, ‘urlencode’, ‘urljoin’, ‘urlparse’, ‘urlsplit’, ‘u
rlunparse’, ‘urlunsplit’, ‘uses_fragment’, ‘uses_netloc’, ‘uses_params’, ‘uses_q
uery’, ‘uses_relative’]

urllib.error
>>> dir(urllib.error)
[‘ContentTooShortError’, ‘HTTPError’, ‘URLError’, ‘__all__’, ‘__builtins__’, ‘__
cached__’, ‘__doc__’, ‘__file__’, ‘__initializing__’, ‘__loader__’, ‘__name__’,
‘__package__’, ‘urllib’]
>>>

urllib.response
>>> dir(urllib.response)
[‘__builtins__’, ‘__cached__’, ‘__doc__’, ‘__file__’, ‘__initializing__’, ‘__loa
der__’, ‘__name__’, ‘__package__’, ‘addbase’, ‘addclosehook’, ‘addinfo’, ‘addinf
ourl’]
>>>

Leave a comment