Increase the Speed of Reading XLSX Files
A few month ago, I wrote an article about some PHPExcel performance tweaks. Sadly we encountered some nasty memory exhaustion issues while reading big XLSX files with PHPExcel. The issues does not occur directly in PHPExcel, furthermore the current PHP process is killed in the SimpleXML/LibXml library execution silently, without any exception or error notification.
Why using the SimpleXML library, when reading XLSX files? This is because XLSX files are build up as XML files internally, which are compressed by a ZIP algorithm.
The basic issue we encountered was that PHPExcel (beside its amount of memory optimization settings) loads the XLSX decompressed content directly by the simple_xml_load* method, which is a huge load on the system resources. It consumes the requested file completly and do not use any kind of stream buffering.
I experimented on the PHP XMLReader library, which provides the ability of reading over XML structures with a streaming mechanism. The downside is the complex usage of this library. So I looked up some libraries which already provide a support for reading XLSX files based on the XMLReader library.
One of the libraries I found and liked was the Spreadsheet-Parser of Akeneo Labs. With a simple API like this, the usage is very easy:
// fetch workbook, including references to all sheets
foreach ($workbook->createRowIterator($myWorksheetIndex) as $rowIndex => $values) {
var_dump($rowIndex, $values);
}
At least here a measurement of reading the same XLSX file with 100k of content rows:
PHPExcel: Duration: 233,65 s; Memory: 811,75 mb
XMLReader: Duration: 54,83 s; Memory: 17,25 mb
As you can see, the memory consumption is extremly low and the processing time is a lot fast too.