Thursday, May 15, 2014

SQL Query Tuning Tips

SQL Query Tuning Tips

Here are some very simple yet powerful SQL tips to remember.

  • Avoid using the following:
  •  Boolean operators >, <, >=, <=, is null,is not null.
  •  Not in, !=
  •  Like '%pattern', not exists
  •  Calculations on unindexed columns or (use union instead)
  •  Having (use a WHERE clause instead)
  •  Do use the following:
• Enable aliases to prefix all columns
• Place indexed columns higher in the WHERE clause
• Use SQL Joins instead of using sub-queries
• Make the table with the least number of rows the driving table by making it first in the FROM clause
Other important points for SQL Tuning:
• Establish a tuning environment that reflects your production  database.
• Establish performance expectations before you begin
• Always Design and develop with performance in mind
• Create Indexes to support selective WHERE clauses and join conditions.
• Use concatenated indexes where appropriate
• Consider indexing more than you think you should, to avoid table lookups.
• Pick the best join method.
• Nested loops joins are best for indexed joins of subsets.
• Hash joins are usually the best choice for "big" joins.
• Pick the best join order.
• Pick the best "Driving" table.
• Eliminate rows as early as possible in the join order
• Use bind variables. Bind variables are key to application scalability.
• Use Oracle hints where appropriate.
• Compare performance between alternative syntax for your SQL statement.
• Consider utilizing PL/ SQL to overcome difficult SQL tuning issues.
• Consider using third party tools to make the job of SQL tuning easier.
• Never do a calculation on an Indexed column (e.g., WHERE salary*5 > :myvalue).
• Whenever possible, use the UNION statement instead of OR conditions.
• Avoid the use of NOT IN or HAVING in the WHERE clause. Instead, use the NOT EXISTS clause.
• Always specify numeric values in numeric form and character values in character form (e.g., WHERE
emp_number = 565, WHERE emp_name = ‘Jones’).
• Avoid specifying NULL in an indexed column.
• Avoid the LIKE parameter if = will suffice. Using any Oracle function will invalidate the index, causing a full-table scan.
• Never mix data types in Oracle queries, as it will invalidate the index. If the column is numeric,
remember not to use quotes (e.g., salary = 50000). For char index columns, always use single quotes (e.g., name = ‘BURLESON’).
• Avoid using subqueries when a JOIN will do the job.
• Use the Oracle “decode” function to minimize the number of times a table has to be selected.
• If your query will return more than 20 percent of the rows in the table, a full-table scan may be better than an index scan.
• Always use table aliases when referencing columns.
• To turn off an index you do not want to use (only with a cost-based optimizer), concatenate a null string to the index column name (e.g., name||') or add zero to a numeric column name (e.g., salary+0).
With the rule-based optimizer, this allows you to manually choose the most selective index to service your query. These rules may seem simplistic but following them in a diligent manner will generally relieve more than half of the SQL tuning problems that are experienced:
Remember that Oracle’s rule-based optimizer looks at the order of table names in the FROM clause to determine the driving table. Always make sure that the last table specified in the FROM clause is the table that will return the smallest number of rows. In other words, specify multiple tables with the largest result set table specified first in the FROM clause.
SQL Query Tuning Tips
Efficient SQL statements play a very important role in maintaining the database. They not only improve the performance of the database but also help in reducing the network traffic. The following can be stated as some of the tips for tuning SQL statements used to retrieve data from the database:
1. Know your application and business data well: Before writing our SQL statements, we need to familiarize ourselves with different business data sources. We must be aware of the data volume and its distribution in the database along with thorough understanding of the business data model – such as relationships among business entities.
2. Test your queries with realistic data: After writing an SQL statement, we must test it in an environment which reflects the production database. The behavior of the SQL statement depends a lot on the environment wherein it is tested.
3. Write identical SQL statements in your applications: Use bind variables, stored procedures and packages whenever possible. The benefits of using identical SQL statements include reduced memory use on the database server and faster execution, as parsing is not done.
4. Use indexes carefully on the table: Be sure of creating the necessary indexes on the table used in the SQL statement. But creating too many of them may result in performance degradation of the server. Try to follow the below rules in creating indexes on a table:
1. Create indexes on the columns which are frequently used in the WHERE clause of the application SQL or the queries given by the end users.
2. Index those columns which are frequently used in joining tables in SQL statements.
3. Use only index columns that select small percent of rows with the same value.
4. Do not index the columns which are used with functions and operators in the WHERE clause of the SQL statement.
5. Do not index those columns which get modified frequently as the index maintenance will become an issue during execution of DML statements on the table.
6. Unique indexes are better than non-unique indexes due to better selectivity. Create unique indexes on PK columns and non-unique indexes on FK columns and the columns frequently used in the WHERE clause.
7. Create index so that the column used in the WHERE clause make up a leading portion of the index.
5. Make an indexed path available to SQL: Sometimes even after creating the indexes, they won’t get used in the execution of the query. One of the different ways of making the indexed path available to the optimizer is to use SQL hints.
6. Use Explain Plan and TKPROF wherever possible: These are the tools that come along with Oracle server and help in understanding the execution of our SQL statements. Explain Plan – helps us to know the access path the optimizer opted. TKPROF – shows the actual performance statistics of the SQL statement.
7. Understand the Optimizer: SQL statement can be executed using rule-based optimizer or cost-based optimizer. In older applications, Oracle has used rule-based approach whereas in newer applications, oracle is promoting more of cost-based approach. If you opt to use cost-based approach, then it is necessary to run ANALYZE schema regularly as doing that stores the recent database statistics in the data dictionary views which are used by the cost-based optimizer. The SQL query can tuned only if cost based approach is opted for.
8. Think globally when acting locally: Keep in mind that the modifications you do to improve the
performance of one SQL statement affects other SQL statements used by applications or other users.
9. The WHERE clause is crucial: The following can be said to be some of the cases where the WHERE
clause will not use index even if it is created on a column. The index is created on column COL1 but it
wont be used in the below illustrated examples.
1. COL1 < COL2
2. COL1 > COL2
3. COL1 <= COL2
4. COL1 >= COL2
5. COL1 is null – The index does not store null values hence when queries for null values index won’t get
used
6. COL1 is not null – The index does not store null values hence when queries for null values index won’t
get used.
7. COL1 not in (value1, value2...)
8. COL1 != expression1
9. COL1 like ‘%pattern1’ – index is not used in this case as the leading edge of the index is suppressed;
but whereas if queried as COL1 like ‘pattern1%’, then index is used as they would result in a bounded
range index scan
10. Not exists (subquery)
11. Expression1 = expression2 – All expressions, functions or calculations involving indexes column
would prohibit using the index on that column.
10. Use WHERE clause instead of HAVING clause to filter the records : Using an indexed column in
HAVING clause of the query to filter records will result in full table scan and avoid the usage on the index
on that column. Instead first filter the records using WHERE clause to filter the records which makes use
of the index on the column. Below queries illustrate the usage of index, if an index is created on DEPTID
column of EMP table.
1. Select dept_id, sum(salary) from emp group by dept_id having dept_id = 100;
2. Select dept_id, sum(salary) from emp where  dept_id = 100 group by dept_id;
In case a) Query will not use the index created on dept_id column whereas case b) query makes use of
it.
11. Specify the leading index columns: In case of a compound index, it is used in an SQL query only if the
leading column of it is used. Below queries are used to illustrate the same with an assumption of a
compound index (PART_NUM, PRODUCT_ID) existing on the table PARTS.
1. Select * from parts where part_num = 2;
2. Select * from parts where product_id = 1002;
Case a) would use the index as the column used in the query WHERE clause is a leading column of the
compound index; whereas case b) query would not use the index. If incase of case b), you want to make
use of index we can re-write the query in the following way:
Select * from party where part_num > 0 and product_id = 1002;
12. Evaluate index scan v/s full table scans: If selecting more than 15 percent of the records from a table, it is better to use/force a full table scan on the table for such query. When using an index does more harm than good to the query, we can use techniques the following to suppress the usage of
index.
1. Select /*+ full(a)*/ * from emp where salary = 5000;
A full hint forces the full table scan on the table and avoids the usage of index on salary column.
2. Select * from emp where salary + 0 = 5000;
The salary column in WHERE clause is modified to an expression which results in avoiding the usage of index on the column.
3. Select * from emp where salary = ‘5000’;
An implicit conversion on the indexed column results in avoiding the index usage.
4. Select * from emp where ss# ||’  ‘|| = ‘111-222-333’;
When the percentage of table rows accessed is 15 percent or less, an index scan will work better because it results in multiple logical reads per row accessed, whereas a full table scan can read all the rows in a block in one logical read. Thus, the performance of full table scan is better when accessing a large percentage of rows from a table.
            To illustrate this point, say the ANALYZE command is issued against the EMP table and all its indexes. Oracle generates the following statistics in the data dictionary table USER_TABLES and
USER_INDEXES:
          Table Statistics:
NUM_ROWS = 1000
BLOCKS = 100
Index Statistics:
BLEVEL = 2
AVG_LEAF_BLOCKS_PER_KEY = 1
AVG_DATA_BLOCKS_PER_KEY = 1
            Based on these statistics, the following would be the logical reads (block accessed) for different types of scans:        
  Use of index to return one row = 3
(BLEVEL+(AVG_LEAF_BLOCKS_PER_KEY - 1) + AVG_DATA_PER_KEY)
Use of index to return all rows = 3000
                                        (NUM_ROWS * Blocks accessed to return one row using index)
 Full table scan = 100(BLOCKS)
13. Use an ORDER BY clause for index scan: Oracle optimizer uses an index scan if the ORDER BY clause is used on an indexed column. The query would retrieve ROWID from index and access the table using the ROWID.
14. Know thy data: You need to know the data and its distribution in your database server in order to understand the usage of indexes on the columns and their execution. You need to keep in mind that the SQL query performance varies as the database grows and data distributions changes.
15. Know when to use large-table scans: A full table scan provides better performance when compared to an index scan on a very small table or a very large table. An index scan on a very large table may require scanning of many indexes and table blocks. When these blocks are brought to database buffer cache, they are kept as long as possible. But as these are not needed for other queries, the hit ratio of the database buffer declines resulting in performance degradation. The blocks read by a full table scan are removed from database buffer cache much earlier does not hamper the performance as an index scan on a very large table.
16.   Minimize the table access (passes): Usually by reducing the number of table passes in a SQL statement improves the performance.
     Here’s an example to illustrate the same:
            The STUDENT table has four columns named NAME, STATUS, PARENT_INCOME, and SELF_INCOME. The name is the primary key. The values of the STATUS column are 0 for independent students and 1 for dependent students.
            The following query returns the name and income for dependent as well as independent
students. It results in two passes through the STUDENT table, creates a temporary table for processing, and initiates a sort to eliminate duplicates:
          SELECT NAME, PARENT_INCOME
FROM STUDENT
WHERE STATUS = 1
UNION
SELECT NAME, SELF_INCOME
FROM STUDENT
WHERE STATUS = 0;
The same request can be fulfilled by this query, which results in only one pass through the table:
          SELECT NAME, PARENT_INCOME*STATUS + SELF_INCOME (1-STATUS)
FROM STUDENT;
17. Join tables in the proper order: The order in which the tables are joined in a multiple table join is crucial. Perform the most restrictive search first in order to filter out the maximum number of rows and only a few records are processed in further levels. You need to make sure that the driving table (master table in a master-detail join) returns the least number of rows. For cost-based optimizer, the order in which the tables appear in WHERE clause is irrelevant as it picks the best execution plan on its own. We can use the ORDERED hint to control the order in which
the tables are accessed in the execution plan.
18. Use index-only searches whenever possible: If possible, write the queries that use index-only searches. The optimizer will have to search only index and not the full table to satisfy the SQL resulting in improving the performance. For example, a compound index exists on L_NAME and F_NAME columns
in EMP table.
1. Select f_name from emp where l_name = ‘Smith’;
2. Select f_name, salary from emp where l_name = ‘Smith’;
In Case a) query, the optimizer performs an index-only search whereas in case b) query, the optimizer performs a table search in order to get the data for a column – salary which is not a part of the compound index. Hence we need to observe the column list considered in the SELECT clause as well while writing a query.
19. Redundancy is good: Provide as much information as possible in the WHERE clause of the query. If in
a WHERE clause of a query, there is COL1 = Col2 and Col1 = 10 then the optimizer infers that COL2 = 10.
But at if there is a condition like COL1 = COL2 and COL2 = COL3, then the optimizer does not infer it as
COL1 = COL3.
20. Keep it simple, stupid: Make your SQL statements as simple as possible as simple statements yield better performance than a single complex SQL statement. The lower the Explain Plan Cost of an SQL, the better the performance of the query. Hence we need to try to tune our queries to reduce the cost as much as possible.
21. You can reach the same destination in different ways: In many cases, more than one SQL can provide with the same desired results. Each query uses a different plan and performs differently. For example,
the use of MINUS operator may perform faster than the use of NOT IN or NOT EXISTS predicates. If an
SQL involves an OR in the WHERE clause, we can re-write the same by using a UNION. Hence we need to evaluate all the possible queries for a requirement and choose the best access path query.
22. Use the special columns: Make use of the special columns like ROWID and ROWNUM in your queries. ROWID searches are faster than searches performed on any columns. A ROWID is not a constant in the database; hence do not hard-coded the value in your SQLs and applications.
23. Use explicit cursors over implicit cursors: Implicit cursors are opened by Oracle for DELETE, UPDATE,
SELECT and INSERT and use an extra fetch. Explicit cursors are opened by developers using DECLARE,
PEN, FETCH and CLOSE cursor statements.
24. Explore and take advantage of the oracle parallel query option: Using the parallel query option, you can execute SQL in parallel for faster performance. This parallel query option can be used only in SMP and MPP systems containing multiple disk drivers. The Oracle database has to be configured for this feature to be used and SQLs specific for that feature should be written.
25. Reduce the network traffic and increase throughput: Using array processing and PL/SQL blocks can achieve better performance and reduce the network traffic. Array processing allows a single SQL to process multiple rows. Using arrays in an insert, we can insert 1000 rows in a table. Significance performance gain can be achieved in client/servers and batch systems using this technique. Multiple
SQL statements can cause heavy network traffic. However, if the SQL statements are in a  single PL/SQL block, the entire block can be sent to Oracle server, processed there and results returned to the application running on the client.

No comments: