Oracle long running query Performance Tunning

In this post, I'll be explaining the long-running concurrent program issue we have faced in one of our instance. The issue was with Oracle Remove Person Program ran for three and half hours to delete one person record. When the customer reported the issue, I checked the AWR report and found that one particular SQL  was running for nearly three hours. Since the SQL dictates more than 74% of the runtime of the concurrent program I will be discussing how I tunned the query to make it run in 16 mins from nearly 3 hours.

 As a usual step, I have checked for the alternate hash plan using the queries in the link. But there are no other hash plans available in the database for the particular SQL id.
So my next hope was on automatic SQL tuning advisor and even it didn't have any action plan for the particular sql_id. 
SQL ID     : bxjnvubgufs6m
SQL Text   : UPDATE HZ_CUST_SITE_USES_ALL SET
             LOCATION=NVL2(LOCATION,PER_DRT_UDF.GENERATE_UNIQUE_STRING(ROWID,'
             HZ_CUST_SITE_USES_ALL','LOCATION',:B1 ),LOCATION) WHERE
             SITE_USE_ID IN (SELECT SITE_USE_ID FROM HZ_CUST_SITE_USES_ALL
             CSU,HZ_CUST_ACCT_SITES_ALL CAS WHERE CSU.CUST_ACCT_SITE_ID =
             CAS.CUST_ACCT_SITE_ID AND ( EXISTS (SELECT 1 FROM HZ_PARTY_SITES
             PS WHERE PS.PARTY_SITE_ID = CAS.PARTY_SITE_ID AND (PARTY_ID =
             :B1 OR (PARTY_ID IN (SELECT PARTY_ID FROM HZ_RELATIONSHIPS WHERE
             SUBJECT_ID = :B1 ))) ) OR ( EXISTS (SELECT 1 FROM
             HZ_CUST_ACCOUNTS CA WHERE CA.PARTY_ID = :B1 AND
             CA.CUST_ACCOUNT_ID = CAS.CUST_ACCOUNT_ID) )))
Bind Variables :
 2 -  (NUMBER):18218
 3 -  (NUMBER):18218
 4 -  (NUMBER):18218
-------------------------------------------------------------------------------
ADDITIONAL INFORMATION SECTION
-------------------------------------------------------------------------------
- The optimizer could not merge the view at line ID 4 of the execution plan.

Since SQL tuning advisor didn't help me in finding the better plans for the sql_id now I am on my own to find the solution to this problem.

I have taken the explain_plan of the sql_id to analyze further. Please find the explain plan of the query below. The font size of explain_plan has been made small to fit everything on the screen for better understanding.

-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | | 119M(100)| |
| 1 | UPDATE | HZ_CUST_SITE_USES_ALL | | | | | |
| 2 | NESTED LOOPS | | 2 | 108 | | 119M (1)| 01:17:43 |
| 3 | NESTED LOOPS | | 2 | 108 | | 119M (1)| 01:17:43 |
| 4 | VIEW | VW_NSO_1 | 2 | 26 | | 119M (1)| 01:17:43 |
| 5 | SORT UNIQUE | | 2 | 60 | | | |
| 6 | FILTER | | | | | | |
| 7 | HASH JOIN | | 39M| 1134M| 567M| 229K (3)| 00:00:09 |
| 8 | TABLE ACCESS FULL | HZ_CUST_ACCT_SITES_ALL | 19M| 340M| | 51685 (3)| 00:00:03 |
| 9 | TABLE ACCESS FULL | HZ_CUST_SITE_USES_ALL | 39M| 453M| | 144K (3)| 00:00:06 |
| 10 | FILTER | | | | | | |
| 11 | TABLE ACCESS BY INDEX ROWID | HZ_PARTY_SITES | 1 | 12 | | 3 (0)| 00:00:01 |
| 12 | INDEX UNIQUE SCAN | HZ_PARTY_SITES_U1 | 1 | | | 2 (0)| 00:00:01 |
| 13 | TABLE ACCESS BY INDEX ROWID BATCHED| HZ_RELATIONSHIPS | 1 | 12 | | 5 (0)| 00:00:01 |
| 14 | INDEX RANGE SCAN | HZ_RELATIONSHIPS_N3 | 2 | | | 3 (0)| 00:00:01 |
| 15 | TABLE ACCESS BY INDEX ROWID | HZ_CUST_ACCOUNTS | 1 | 12 | | 3 (0)| 00:00:01 |
| 16 | INDEX UNIQUE SCAN | HZ_CUST_ACCOUNTS_U1 | 1 | | | 2 (0)| 00:00:01 |
| 17 | INDEX UNIQUE SCAN | HZ_CUST_SITE_USES_U1 | 1 | | | 1 (0)| 00:00:01 |
| 18 | TABLE ACCESS BY INDEX ROWID | HZ_CUST_SITE_USES_ALL | 1 | 41 | | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------------

By looking at the above explain plan I thought that the issue is due to the poor use of the index for the tables HZ_CUST_ACCT_SITES_ALL and HZ_CUST_SITE_USES_ALL. So I started creating indexes on the columns of tables HZ_CUST_ACCT_SITES_ALL and HZ_CUST_SITE_USES_ALL. No matter what index I create on whichever column Optimizer isn't picking the Index. But Index created is used for the later part of the query in line 17.

 As the next action, I have added the hint to force the optimizer to pick the index but after forcing the optimizer the cost of the query just increased instead of decreasing.
After losing all the possible hope of tuning the query, I have explored the query's parallel execution. We did see performance improvement for the parallel execution, auto parallelism increased the performance of the query by 2.3, and Parallelism(10) increased the performance of the query by 3.4 times. But it was not approved to implement.

For the next few days of my life, I was just thinking of other possible ways to somehow get the hash_plan that was used before this issue. I ran the same program in all the environments to find any alternate good hash_plans so that I can somehow replicate them in the environment where we are having the performance issue. But it was running for more than 3 hours in all the environments.

Since every path, I have chosen to fix the issue was a dead-end with no hope of fixing the issue. I have asked for help from one of my seniors for suggestions. He had asked me to run the sqldb360 for the particular sql_id and send the report to him. I have run the query and send the report to him.
After sending the report I just exploring the sqldb360 report and the senior person who asked for help came back told me there is no way to fix this query. I have continued exploring the sqldb360 hoping even if I don't fix this issue know how to analyze sqldb360 might help in the future. To my surprise, I found the below section in the sqldb360 report.


5b. SQL Monitor Reports

  1. SQL Monitor Reports zip
It was like a ray of light at the end of the tunnel. After downloading the zip I have opened the first HTML page in the zip and it took some time to load as it using Adobe flash load the page. It was a Database Active Report and I started looking and found activity % columns. The Activity% column shows the amount of time/resource taken to complete the particular part of the SQL out of total. Please find the screenshot for reference.

Database Active report can be accessed in OEM Performance > SQL Monitoring.
After removing some unwanted columns Please find the report data below.


=============================================================================================================================
| Id |                  Operation                   |          Name          |  Rows   | Cost  |Execs |   Rows   | Activity |
|    |                                              |                        | (Estim) |       |      | (Actual) |   (%)    |
=============================================================================================================================
|  0 | UPDATE STATEMENT                             |                        |         |       |    1 |        0 |     3.71 |
|  1 |   UPDATE                                     | HZ_CUST_SITE_USES_ALL  |         |       |    1 |          |          |
|  2 |    NESTED LOOPS                              |                        |       2 |  119M |    1 |          |          |
|  3 |     NESTED LOOPS                             |                        |       2 |  119M |    1 |          |          |
|  4 |      VIEW                                    | VW_NSO_1               |       2 |  119M |    1 |          |          |
|  5 |       SORT UNIQUE                            |                        |       2 |       |    1 |          |          |
|  6 |        FILTER                                |                        |         |       |    1 |        0 |     1.78 |
|  7 |         HASH JOIN                            |                        |     40M |  230K |    1 |      40M |     3.47 |
|  8 |          TABLE ACCESS FULL                   | HZ_CUST_ACCT_SITES_ALL |     20M | 51685 |    1 |      20M |     0.09 |
|  9 |          TABLE ACCESS FULL                   | HZ_CUST_SITE_USES_ALL  |     40M |  144K |    1 |      40M |     0.91 |
|    |                                              |                        |         |       |      |          |          |
| 10 |         FILTER                               |                        |         |       |  20M |        0 |     2.64 |
| 11 | TABLE ACCESS BY INDEX ROWID | HZ_PARTY_SITES | 1 | 3 | 20M | 20M | 36.44 |
| | | | | | | | | | 12 | INDEX UNIQUE SCAN | HZ_PARTY_SITES_U1 | 1 | 2 | 20M | 20M | 2.00 | | | | | | | | | | | 13 | TABLE ACCESS BY INDEX ROWID BATCHED | HZ_RELATIONSHIPS | 1 | 5 | 20M | 0 | 1.31 | | 14 | INDEX RANGE SCAN | HZ_RELATIONSHIPS_N3 | 2 | 3 | 20M | 0 | 1.06 | | | | | | | | | |
| 15 | TABLE ACCESS BY INDEX ROWID | HZ_CUST_ACCOUNTS | 1 | 3 | 20M | 0 | 45.11 |
| | | | | | | | | | 16 | INDEX UNIQUE SCAN | HZ_CUST_ACCOUNTS_U1 | 1 | 2 | 20M | 20M | 1.44 | | | | | | | | | | | 17 | INDEX UNIQUE SCAN | HZ_CUST_SITE_USES_U1 | 1 | 1 | | | | | 18 | TABLE ACCESS BY INDEX ROWID | HZ_CUST_SITE_USES_ALL | 1 | 2 | | | | =============================================================================================================================

As you can see from the above cost of accessing the table HZ_CUST_ACCOUNTS and HZ_PARTY_SITES was exponentially less compared to the total cost of the query but both these tables access is responsible for nearly 81% of the total runtime. Since I found the issue I have created two indexes to fix the issue. Please find those indexes below.

create index ar.ww_hz_party_sites on ar.hz_party_sites (party_site_id,party_id);
create index ar.ww_HZ_CUST_ACCOUNTS on ar.HZ_CUST_ACCOUNTS (cust_account_id,party_id);

After creating the above two indexes the query got completed in 16mins. After creating these two indexes we got huge performance improvement which is better than parallel execution itself.
If have missed anything or if you have a suggestion for the action plan explained here. Please feel free to update in the comment section.

Comments

  1. Nice blog 👍 keeping posting
    You have tried all ways
    If was there in your place I will first try to generate level 12 Tkprof to find where exactly SQL is spending time/consistent Reads are high and will check filtering condition and data distribution on columns
    If there is good selectivity for index will create it and check it will help or not
    Post change again will enable level 12 Tkprof will compare both pre and post Tkprof to validate performance gain

    ReplyDelete
    Replies
    1. Thanks for suggestion, I did generated the level 12 tkprof but didn't spend much time on analyzing it. Will explore more about it.

      Delete
    2. It depends actually if know exactly how to read Tkprof and what to check quickly will not take much time and it will help us manual tuning skills as well
      Because in some cases we will not get any recommendations while using automatic tuning

      Delete
  2. good one. next time I spend more time in tkprof too.

    ReplyDelete

Post a Comment