SELECT ruser0_.oid AS col_0_0_, ruser0_2_.fullobject AS col_1_0_ FROM m_user ruser0_ INNER JOIN m_focus ruser0_1_ ON ruser0_.oid = ruser0_1_.oid INNER JOIN m_object ruser0_2_ ON ruser0_.oid = ruser0_2_.oid LEFT OUTER JOIN m_assignment assignment1_ ON ruser0_.oid = assignment1_.owner_oid AND ( assignment1_.assignmentowner =:1 ) WHERE assignment1_.validfrom >:2 AND assignment1_.validfrom <=:3 OR assignment1_.validto >:4 AND assignment1_.validto <=:5 order by nlssort(ruser0_.oid,'NLS_SORT=BINARY_AI') asc fetch first :6 rows only
SQL Repository - problematic queries
This document identifes problematic queries existing in 4.0-4.2 versions and ideas what to do with them. The queries may not be critical in smaller deployments, but proved to be in large scale ones.
Partitioned validity scanner task
Symptoms: extremely high disk IO
Known fact: JOIN to
m_user.oidcan be replaced with
m_object.oid. (Hibernate in Action)
Questions: What is the count of returned rows when the problem occures?
validtoare indexed and the indexes should be used. Both
(date > x and date ⇐ y)should result in
INDEX RANGE SCAN. This can be good depending on the range of the scan.
Both clauses are joined with
ORwhich internally performs
UNION-ALLwith the two
ORDER BYon experimental tables (without any joins, just oid + two timestamps) shows much higher "Total cost" than both filters. E.g., with 2-3k of rows order cost is 72, filter cost is 7+3 with over 100k total rows. The tenfold factor stays roughly the same with 13k rows of result.
Also, actual select uses function in ORDER, the question is whether function based index could help.
Assignment table is very big (tens of millions rows and more).
In the new repo, this should be more like (PostgreSQL style):
SELECT u.oid, u.fullobject FROM m_user u WHERE EXISTS ( SELECT * FROM m_assignment a WHERE a.owner_oid = u.oid -- "join", this should be super efficient AND a.assignmentowner = ? -- discriminator, perhaps not necessary for user AND (a.validfrom > ? AND a.validfrom <= ? OR a.validto > ? AND validto <= ? )) ORDER BY u.oid LIMIT ?
It’s good it’s limited, this in general is one of the biggest performance boosts. Count queries can’t benefit from this and can take disproportionately longer for the same query.
Assignments conditions are in subquery now, this also leaves the outer query with the right number of rows.
No abstract table joins needed, all information is in a single table.
Related count search
The same problem as above, but obviously without order this time.
SELECT COUNT(ruser0_.oid) AS col_0_0_ FROM m_user ruser0_ INNER JOIN m_focus ruser0_1_ ON ruser0_.oid = ruser0_1_.oid INNER JOIN m_object ruser0_2_ ON ruser0_.oid = ruser0_2_.oid LEFT OUTER JOIN m_assignment assignment1_ ON ruser0_.oid = assignment1_.owner_oid AND ( assignment1_.assignmentowner =:1 ) WHERE assignment1_.validfrom >:2 AND assignment1_.validfrom <=:3 OR assignment1_.validto >:4 AND assignment1_.validto <=:5
Called very often.
Cause of slowness seems to be the sheer amount of shadows and their string extensions.
This may be improved by partitioning by
resourceref_targetoid, which otherwise have very low variability for a normal (b-tree) index.
m_object_ext_string.owner_oidis coverd by PK index.
SELECT rshadow0_.oid AS col_0_0_, rshadow0_1_.fullobject AS col_1_0_ FROM m_shadow rshadow0_ INNER JOIN m_object rshadow0_1_ ON rshadow0_.oid = rshadow0_1_.oid LEFT OUTER JOIN m_object_ext_string strings1_ ON rshadow0_.oid = strings1_.owner_oid -- covered by PK, owner_oid is first there AND ( strings1_.ownertype =:1 AND strings1_.item_id =:2 ) WHERE strings1_.stringvalue =:3 AND rshadow0_.objectclass =:4 AND rshadow0_.resourceref_targetoid =:5 AND ( rshadow0_.resourceref_relation IN ( :6, :7, :8 ) ) AND ( rshadow0_.dead =:9 OR rshadow0_.dead IS NULL )