Processing CSV files in a memory efficient way
A little while ago I had to dive deeper into the performance optimized usage of PHPExcel. Our users are uploading files like Excel or CSV with a lot data to process. Initially we used the PHPEXcel instance without any tuning of the default configuration which lead to heavy memory issues on relativly small files. So I had to avoid reading all file content at ones to the buffer (like file_get_contents() does).
In my research mainly optimizing the usage of PHPExcel I came across a tiny library I am grown really fond of. It is called Goodby/CSV. Both tools have a very well grounded documentation to read in and understand the basics and the usage.
Goodby/CSV is highly memory efficent and declares itself as extendable, although I did not check the second part. Goodby/CSV make use of the closure feature of PHP (introduced in PHP 5.3), hereby you define an anonymous function as callback for each read file row.
So as mighty as PHPExcel is, it brings a lot of overhead on reading files with itself, especially on reading CSV files.
I did a little time measurement test on reading a CSV file (1988 entries, filesize about 1.9 MB). Here are the results:
Runs | PHPExcel | Goodby/CSV | ||
---|---|---|---|---|
Duration | Mem. Usage | Duration | Mem. Usage | |
1 | 14,99s | 51,76MB | 0,25s | 21,00MB |
2 | 14,91s | 51,76MB | 0,25s | 21,00MB |
3 | 15,27s | 51,76MB | 0,25s | 21,00MB |
So this is it, a way lot faster accessing and reading CSV file content by Goodby/CSV. The library also provides support for easy CSV file writing. The tool is licensed under MIT License, so there should be no problems using the libary in your application.