OR-Mapper and the ‘many queries’ problem

Applications using an OR-Mapper like Hibernate or OpenJPA are reckoned as to be slow. This observation is true, since there is an overhead compared to using hand-crafted and optimized SQL-Queries and Database structure. But the overhead has not to be as big as it is often observed in applications with entities. There is a lot of potential optimizing such a system. Unfortunately you won’t be able to circumvent quarelling with the underlying database, although a persistence layer is all about abstraction. Not bringing into mind what happens in your persistence layer will lead to really bad performance. There are some ‘main concerns’ using an OR-Mapper:

  • the n+1 queries problem. This problem is discussed in the post fetch strategy subselect for hibernate.
  • slow queries. This issue will be discussed in a subsequent post. In this post I will only show how to identify slow queries. The analysis and improvement of a query is database dependent and not as easy as using JDBC, since the OR-Mapper defines some queries by itself (like loading an association) and the possibilities of influencing this process is very limited. In addition HQL or JPQL queries are translated into SQL, so that the influence is restricted, too. Last but not least for speeding up queries it is often essential to change your database scheme by adding indices for example.
  • many queries. The use of an OR-Mapper delude to ignore, that every call to an entity may lead to one or more database calls. This results in many unnecessary queries. The calls may be hard to find in a big project. So this post will show, how to identify these issues using PostgreSQL as the underlying database.

The following code snippet shows a common cause for the ‘many queries’ issue:

// Retrieve an entity
EntA a = entitymanager.find(EntA.class, 1);
double countCs = 0, countCweights = 0;
// retrieve one-to-many association
for(EntB b : a.getBs()) {
  // retrieve a one-to-many association for each b
  for(EntC c: b.getCs()) {
    // count #c
    countCs++;
    // add weights
    countCweights += c.getWeight();
  }
}
if(countCs > 0) {
  // evaluate arithmetic average
  return countCweights/countCs;
}
// prevent division by zero
else {
  return 0.0d;
}

The persistence layer has to state a query for each EntB in a.getBs() in order to retrieve the b.getCs() if the collection is fetched lazily (which is the default configuration). In this example the entities a and a.getBs() are not used and therefore shouldn’t be retrieved. Even the b.getCs() should not be retrieved, since JPQL offers a method for calculating the average of a value. An alternative would be:

Query query = entitymanager.createQuery("SELECT AVG(c.weight) FROM EntA AS a, IN(a.bs) b, " +
    "IN(b.cs) c WHERE a.id = :id");
query.setParameter("id", 1);
return (double)query.getSingleResult();

This time the persistence provider will only retrieve the average value and pose only one query. The example is bold and the problematic code snippets in your code, where lineary or quadratic amount of statements are requested, won’t be as easy to find. So you may use a tool for analyzing the posed SQL queries: http://pgfouine.projects.postgresql.org/ (this tool has been dug-up by Oliver Bauer)

In order to get it working you need to adjust the following parameters in your postgresql.conf:

#log_line_prefix = ‘%m’
#lc_messages = ‘en_US’
log_line_prefix = ‘%t [%p]: [%l-1] ‘
log_filename = ‘postgresql-%Y-%m-%d_%H%M%S.log’
log_destination = ‘stderr’
log_rotation_size = ‘0’
logging_collector = on
log_directory = ‘/tmp’
log_connections = off
log_disconnections = off
log_duration = on
client_min_messages = info
log_min_messages = info
log_min_duration_statement = 0
log_statement = ‘all’
log_connections = off
log_disconnections = off

In addition add the following property to the configuration of your persistence unit in your persistence.xml:

<!-- ... -->
<persistence-unit>
  <!-- ... -->
  <properties>
    <!-- ... -->
    <property name="hibernate.use_sql_comments" value="true" />
  </properties>
</persistence-unit>

This property encourages hibernate to add sql comments to the db statements explaining its semantics. This may be /* load one-to-many EntA.bs */ or /* SELECT a FROM EntA AS a WHERE a.id = :id */. Before invoking the method you want to inspect, pipe the changes of the log into a new file and analyze it with pgfouine:

tail -f /tmp/postgresql-2010-10-01_122746.log > pglog.log
php pgfouine.php -file pglog.log -logtype stderr > pganalysis.htm

The HTML-file pganalysis.htm contains tables with information among others about the most frequent queries and the slowest queries. Similar queries differing in the query values only are normalized to one query type. The hibernate sql comments help to refer the queries to calls in your code. Queries that are posed multiple times of the cardinality of an association are candidates for optimization. Take into account, that you may have to divide the count by the batch size for fetching associations.

Furthermore a closer look to a SQL statement may reveal that there are too many LEFT OUTER JOINs for retrieving associated objects. You can suppress this by explicitly annotating it with @Column(fetch=FetchType.LAZY) in your entity class.

Leave a Reply

Your email address will not be published. Required fields are marked *