Python

Working With Excel Spreadsheets using Python

Microsoft Excel is a popular application for Windows that is used to work with spreadsheets, tables and charts. Spreadsheets are used extensively in the business world to record budgets, create reports, manage inventory or perform calculations.

Working on Excel tasks in your day to day work can be boring, mind-numbing and even daunting especially if you have to go through thousands of rows of data to make small edits or to compare figures. Python can be used to automate working with Excel documents and in this blog post, I will show you how.

The first thing to do, is to download a python library called OpenPyXL. OpenPyXL can be installed easily using pip. To learn more about how to install 3rd party python packages, read this.

pip install openpyxl

Verify that it has been successfully installed by starting the python interpreter and running:
import openpyxl
If the import does not print any errors it means you have successfully installed OpenPyXL.

Opening Excel Files

Excel files are made up of numbered rows and columns with letters starting at A that can contain either numbers or text. Rows and columns in a single grid make up a work sheet or sheet for short. Multiple work sheets make up a workbook. In other words, an Excel spreadsheet is a workbook.

To open an Excel workbook, we need to import the OpenPyXL module and use its load_workbook() function. For the purpose of this article, let’s assume we want to open a file named example.xlsx.

import openpyxl
excel_doc = openpyxl.load_workbook('example.xlsx')

Here, we called the openpyxl.load_workbook() function and passed in the name of the excel document we want it to open. The function returns a Workbook object that represents the actual Excel file.

Getting the sheets

The next step in retrieving data from an Excel file is to specify which sheet we are interested in. To find out the names of the worksheets in the workbook, call the get_sheet_names() method.

>>> import openpyxl
>>> excel_doc = openpyxl.load_workbook('example.xlsx')
>>> excel_doc.get_sheet_names()
[u'Sheet1', u'Sheet2', u'Sheet3']

We have three sheets in this document, namely Sheet1, Sheet2 and Sheet3. The next step is to select a sheet to get data from.

>>> import openpyxl
>>> excel_doc = openpyxl.load_workbook('example.xlsx')
>>> excel_doc.get_sheet_names()
[u'Sheet1', u'Sheet2', u'Sheet3']
work_sheet1 = excel_doc.get_sheet_by_name('Sheet1')
<Worksheet "Sheet1">

Passing 'Sheet1' to the get_sheet_by_name() method creates a Worksheet object that represents Sheet1 in the Excel file. The worksheet object allows you to get data from the the individual cells or from a range of them.

To get the value of a single cell, such as the cell in the first row and first column(cell A1) do the following:

>>> work_sheet1['A1'].value

To access many cells, use slicing in the same way you would do it in Excel:

>>> work_sheet1['A1':'C3']

An alternative way of retrieving cell values is to use the sheet’s cell() method. The cell() method allows you to specify the cell’s column and row as integer keyword arguments. This is useful when working with excel files that have a lot of columns and rows.

>>> sheet.cell(row=1, column=2).value
u'Apples'

How to get data from a range of cells

To read the data from a range of cells, you can use the methods above in a for loop. Let’s suppose you want to get the contents of a single column.

>>> for i in range(1, 8):
	print(sheet.cell(row=i, column=2).value)

	
Apples
Cherries
Pears
Oranges
Apples
Bananas
Strawberries

The code above will go down column 2, starting at row 1 and print the values of each cell in column 7 until it gets to row 7.

Writing data to an Excel file

It is possible to use OpenPyXL to change or modify the spreadsheet data. For example, the code below shows how to change the title of the active worksheet.

>>> active_worksheet = excel_doc.active
>>> active_worksheet.title
u'Sheet1'
active_worksheet.title = "End Of Year Report"
>>> active_worksheet.title
u'End Of Year Report'
>>> active_worksheet['A1'].value = "2017"
>>> active_worksheet['A1'].value
u'2017'

Changing sheet titles or cell data is simple to do, it is much like assigning data to a normal Python data structure.

OpenPyXL allows you to do more than read and write data to excel documents. Other OpenPyXL features I did not discuss here include changing font styles in cells, adding formulas, charts and even freezing panes.

For more information on OpenPyXL, read it’s documentation here or read Chapter 12 of Al Sweigart’s Automate The Boring Stuff With Python book.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s