There are many Python packages to load them: csv is pretty well managed with the included csv module or numpy or pandas. xlrd can read xls files with formatting and xlsx without. openpyxl can read and write xlsx files. But I felt I needed a higher level package that could handle the layout inside a sheet.
Here's Sheetparser, a Python module whose idea is to describe the layout of the sheet
with spatial patterns, allowing for changes in actual
position of the elements.
Let's take an example: the sheet below is made of 2 tables and some lines, separated by empty lines:
If the size of the tables change between 2 versions of the file, it can become quite tedious to write the code to read the data. With sheetpattern it's a simple as:
from sheetpattern import * sheet = load_workbook(filename)['Sheet2'] pattern = Sheet('sheet', Rows, Table, Empty, Table, Empty, Line, Line) context = PythonObjectContext() pattern.match_range(sheet, context) assert context.table.data[0][0] == 'a11'
And it's easy to make that code tolerant to small layout changes. I think the library works well and changed the way I load and write Excel sheets programmatically.
It was also a good exercise to learn how to properly publish a package:
It was also a good exercise to learn how to properly publish a package:
- Check style with pep8 and pychecker
- Documentation on readthedocs
- Code on github
- and uploading to pypi
No comments:
Post a Comment