Think About It: PHP/PostgreSQL Bulk Performance (Part 3)
This article is the last of a three-part series and describes how we optimized the persistence process of bulk data in our code in combination with PostgreSQL. Make sure you covered the first article about how we tweaked PHPExcel to run faster while reading Excel and CSV files and the second article about how we optimized our data processing and reached performance improvements tweaking our code.
Our performance optimization sprint contained the improvement of read file data, processing and persist it. While the file data is relatively small referred to the file size, the amount of data sets can vary between 5.000 and more then 40.000 entities on an average, but may be a lot more in some cases.
In the second article of this series we ended with an example like this:
/* @var $dataProductSet \Foo\Bar\File\Data\Products[] */
$dataProductSet = array(...);
$indexedProducts = $this->getProductsIndexedBySku(
$this->productManager->getProducts()
);
$productsToStore = array();
foreach ($dataProductSet as $dataProduct) {
/* @var $dataProduct \Foo\Bar\File\Data\Products */
if (isset(indexedProducts[$dataProductSet->getSku()])) {
// Alread existing product
$product = indexedProducts[$dataProductSet->getSku()];
// ...
} else {
// New product
// $product = ...
// ...
}
// ...
array_push($productsToStore, $product);
}
The example covers the processing of the incoming data, but not the persistence of it. Based on the example above, the common way to save the data collection would be something like this:
foreach ($productsToStore as $product) {
// saveEntity does an Insert on new entites
// and an Update on already existing entities
$databaseHandler->saveEntity($product);
}
This code creates an Insert or Update statement for each entity, regardless with or without active Transaction. We extended our Database Handler implementation with some bulk compatible CRUD methods. An example scheme for the bulk update query looks like this:
QUERY STATMENT:
update test as t set
column_a = c.column_a
from (values
('123', 1),
('345', 2)
) as c(column_b, column_a)
where c.column_b = t.column_b;
We've done some similar compositions for the insert and delete bulk queries. Now the example can look as simple as this:
$databaseHandler->saveAll($products);
This code creates only one single data storage query statement. After using this bulk compatible data handling, we tweaked the performance by over 200% of the database query execution time.