CSV Files in Python

CSV Files in Python

Comma Separated Values (CSV) is the most common import and export format used for spreadsheets and databases. A csv file contains a number of rows, and each contains a number of columns, usually separated by commas.

>>> import csv
>>> dir(csv)
[‘Dialect’, ‘DictReader’, ‘DictWriter’, ‘Error’, ‘QUOTE_ALL’, ‘QUOTE_MINIMAL’, ‘
QUOTE_NONE’, ‘QUOTE_NONNUMERIC’, ‘Sniffer’, ‘StringIO’, ‘_Dialect’, ‘__all__’, ‘
__builtins__’, ‘__cached__’, ‘__doc__’, ‘__file__’, ‘__initializing__’, ‘__loade
r__’, ‘__name__’, ‘__package__’, ‘__version__’, ‘excel’, ‘excel_tab’, ‘field_siz
e_limit’, ‘get_dialect’, ‘list_dialects’, ‘re’, ‘reader’, ‘register_dialect’, ‘u
nix_dialect’, ‘unregister_dialect’, ‘writer’]

Reading the csv file is easy. The reader function will take one line in our file and make a list with all columns of the line. It’s the simplest form of reader:

>>> t = csv.reader(open(“D:/ale.csv”, mode = “r”, encoding = “utf8”))
>>> for i in t:
…     print(i)

[‘Line content’, ‘ position’, ‘ nameTeko’, ‘ 1’, ‘ OlekMeko’, ‘ 2’, ‘ BolekWeko’
, ‘ 3’, ‘ Lolek’]
>>>

It works, but we need more. We have the csv file:

Title, Name, Date
“Ana”, Olo, 1997
“Wana”, Lolo, 1998
“Mana”, Kolo, 2000
“Lana”, Molo, 1989

We can read this csv file:

>>> t = open(“D:/eta.csv”, “r”, encoding = “utf8”)
>>> u = csv.reader(t)
>>> for i in u:
…     print(i)

[‘Title’, ‘ Name’, ‘ Date’]
[‘Ana’, ‘ Olo’, ‘ 1997’]
[‘Wana’, ‘ Lolo’, ‘ 1998’]
[‘Mana’, ‘ Kolo’, ‘ 2000’]
[‘Lana’, ‘ Molo’, ‘ 1989’]
>>>
>>> t.close()

It’s the same way like before, but the syntax is different. We can also do that in this way:

with open(“D:/eta.csv”, “r”, encoding = “utf8”) as t:

u = csv.reader(t)

for i in u:

print(i)

We have the result:

[‘Title’, ‘ Name’, ‘ Date’]
[‘Ana’, ‘ Olo’, ‘ 1997’]
[‘Wana’, ‘ Lolo’, ‘ 1998’]
[‘Mana’, ‘ Kolo’, ‘ 2000’]
[‘Lana’, ‘ Molo’, ‘ 1989’]

Let’s read again:

t = open(“D:/eta.csv”, “r”, encoding = “utf8”)

u = csv.reader(t)

Title = []

Name = []

Date = []

for i in u:

title, name, date = i

Title.append(title)

Name.append(name)

Date.append(date)

We can then loop over the reader object to process one row at a time. First, however, we should lists to store the columns in our file.

Now let’s see the result:

>>> Title
[‘Title’, ‘Ana’, ‘Wana’, ‘Mana’, ‘Lana’]
>>> Name
[‘ Name’, ‘ Olo’, ‘ Lolo’, ‘ Kolo’, ‘ Molo’]
>>> Date
[‘ Date’, ‘ 1997’, ‘ 1998’, ‘ 2000’, ‘ 1989’]
>>>

A more compact way of achieving the same result:

t = open(“D:/eta.csv”, “r”, encoding = “utf8”)

u = csv.reader(t)

Title, Name, Date= zip(*u)

The zip(* ) call changes the list of rows into the list of columns.

Now let’s see the result:

>>> Title
(‘Title’, ‘Ana’, ‘Wana’, ‘Mana’, ‘Lana’)
>>> Name
(‘ Name’, ‘ Olo’, ‘ Lolo’, ‘ Kolo’, ‘ Molo’)
>>> Date
(‘ Date’, ‘ 1997’, ‘ 1998’, ‘ 2000’, ‘ 1989’)
>>>

We can write to the csv file too. We use writer() to create an object for writing, and we iterate over the rows in our file, using writerow() to print them.

Let’s Python’s help:

writer(…)
csv_writer = csv.writer(fileobj [, dialect=’excel’] [optional keyword args])

for row in sequence:

csv_writer.writerow(row)

or

csv_writer = csv.writer(fileobj [, dialect=’excel’] [optional keyword args])

csv_writer.writerows(rows)

The “fileobj” argument can be any object that supports the file API.

Great! Now we can try with real examples.

When we see the error:

typeerror ‘str’ does not support the buffer interface csv

We  should include encoding during opening our file:

>>> t = open(“D:/eta.csv”, “w”, encoding=”utf8″)

So let’s try to write to the file. First we should create the  writer object:

>>> t = open(“D:/eta.csv”, “w”, encoding = “utf8”)
>>> import csv
>>> u = csv.writer(t)
>>> u
<_csv.writer object at 0x00C126F0>
>>>

We have the writer object. Let’s write something to our file:

>>> t = open(“D:/eta.csv”, “w”, encoding=”utf8″)
>>> import csv
>>> u = csv.writer(t)
>>> Title = [“Gana”]
>>> Name = [“Gono”]
>>> Date = [“1999”]
>>> for i in zip(Title, Name, Date):
…     u.writerow(i)

16
>>> t.close()
>>>

The  file  content:

Gana,Gono,1999

But the new content replaced the old one.

>>> t = open(“D:/eta.csv”, “a”, encoding=”utf8″)
>>> import csv
>>> u = csv.writer(t)
>>> Title = [“Gana”]
>>> Name = [“Gono”]
>>> Date = [“1999”]
>>> for i in zip(Title, Name, Date):
…     u.writerow(i)

16
>>> t.close()
>>>

The file content:

Gana,Gono,1999

The same situation.

one row of data at a time

>>> t = open(“D:/eta.csv”, “w”, encoding=”utf8″)
>>> import csv
>>> u = csv.writer(t)
>>> Title = [“Gana”]
>>> Name = [“Gono”]
>>> Date = [“1999”]
>>> v = zip(Title, Name, Data)
>>> u.writerows(v)
>>> t.close()
>>>

all rows of data at once

CSV technology isn’t standardized, and there are some differences among vendors. Python can handle them. For example, we  can separate  the date within a row with \t instead of a comma.

>>> t = open(“D:/eta.csv”, “w”, encoding=”utf8″)
>>> u = csv.writer(t, delimiter=’\t’)

Several format options can be set here.  The group of these properties is called a dialect. The default dialog is “excel”. And it is used to export data to spreadsheets. Dialects supports parsing using different parameters. And we can see which dialects Python has:

>>> csv.list_dialects()
[‘excel-tab’, ‘unix’, ‘excel’]
>>>

We can see the current maximum field size allowed by the parser. If any new limit is given, this becomes the new limit.

>>> csv.field_size_limit(our_limit)
131072
>>>

We can see the re tool for finding:

>>> dir(csv.re)
[‘A’, ‘ASCII’, ‘DEBUG’, ‘DOTALL’, ‘I’, ‘IGNORECASE’, ‘L’, ‘LOCALE’, ‘M’, ‘MULTIL
INE’, ‘S’, ‘Scanner’, ‘T’, ‘TEMPLATE’, ‘U’, ‘UNICODE’, ‘VERBOSE’, ‘X’, ‘__all__’
, ‘__builtins__’, ‘__cached__’, ‘__doc__’, ‘__file__’, ‘__initializing__’, ‘__lo
ader__’, ‘__name__’, ‘__package__’, ‘__version__’, ‘_alphanum_bytes’, ‘_alphanum
_str’, ‘_compile’, ‘_compile_repl’, ‘_expand’, ‘_pattern_type’, ‘_pickle’, ‘_sub
x’, ‘compile’, ‘copyreg’, ‘error’, ‘escape’, ‘findall’, ‘finditer’, ‘functools’,
 ‘match’, ‘purge’, ‘search’, ‘split’, ‘sre_compile’, ‘sre_parse’, ‘sub’, ‘subn’,
 ‘sys’, ‘template’]
>>>

The csv module includes quoting options:

csv.QUOTE_ALL

We can quote everything (regardless of the typ we have)

csv.QUOTE_MINIMAL

We  can quote fieldas with special characters

csv.QUOTE_NONNUMERIC

We can quote all the fields that aren’t integers or floats (numerals)

csv.QUOTE_NON

We can quote anything for output

For example, we have:

u = open(“D:/zeta.csv”, “w”, encoding = “utf8”)

v = csv.writer (u, delimiter = “\t”, quotechar = ‘”‘, qutoting = csv.QUOTE_ALL)

 

Leave a comment