Tuesday, September 28, 2010

ORACLE PARALLEL DML

ORACLE PARALLEL DML
There is a myth that if the table is not partitioned, then parallel DML will not work. This is no
longer true. Several years ago I had updated 35 million rows table (size 40GB) using parallel
DML option in 8i. The parallel update ran 6X faster than procedural update using PL/SQL.

In this paper, you will learn how to update a BIG Table using parallel DML option.

All the tests given below are conducted in 10gR2.

First, I created a table, T1 using ALL_OBJECTS. The table has 10,000 rows.

One of the quickest way to find out the effectiveness (throughput) of parallelism in your system
is to run a query (SELECT) using parallel option, and measure the throughput by querying
V$PQ_TQSTAT table.
(Note: V$PQ_TQSTAT will be renamed to V$PX_TQSTAT in 11g).
alter session set db_file_multiblock_read_count = 128;

select /*+ FULL(t1) parallel(t1,8) */ * from tamil.t1 ;

select process, num_rows, bytes from v$pq_tqstat order by process;

PROCESS NUM_ROWS BYTES
---------- ---------- ----------
P000 1300 664452
P001 1300 664551
P002 1300 664551
P003 1300 664551
P004 1200 613428
P005 1200 613428
P006 1200 613428
P007 1200 613428
QC 10000 5111817
-- Query coordinator
If you see the number of rows evenly distributed among the parallel slave processes, then it
indicates that your system is well configured for parallel processing.
Case # 1 - Simple update on NON-PARTITIONED Table
I have built a table, T1 that has 3.11 Million rows.

SQL> desc t1
Name Null? Type
-------------------------- -------- ------------------
OWNER NOT NULL VARCHAR2(30)
OBJECT_NAME NOT NULL VARCHAR2(30)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NOT NULL NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED NOT NULL DATE
LAST_DDL_TIME NOT NULL DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)

SQL> select count(*) from t1;

COUNT(*)
----------
3111872

SQL> select table_name, num_rows, blocks, empty_blocks,
last_analyzed from user_tables
where table_name= 'T1' ;

TABLE_NAME NUM_ROWS BLOCKS EMPTY_BLOCKS LAST_ANALYZ
------------ ---------- ---------- ------------ -----------
T1 3116183 43085 0 05-MAR-2007
[I used DBMS_STATS.GATHER_TABLE_STATS procedure to gather table statistics with estimate
percent NULL(meaning 100 percent), but it reported more number of rows as you see in the
USER_TABLES].
Let me do the first test.

SQL> alter session set db_file_multiblock_read_count = 128;

Session altered.

Elapsed: 00:00:00.01

SQL> update tamil.t1 set object_name = lower(object_name);
3111872 rows updated.

Elapsed: 00:09:34.33
SQL> commit;

Commit complete.

Elapsed: 00:00:00.13

A single process ran for 9 Min and 34 seconds.
Case # 2 Parallel Update on NON-PARTITIONED TABLE
I have shutdown the instance and restared.

SQL> alter session enable parallel dml ;

Session altered.

Elapsed: 00:00:00.00

SQL> alter session force parallel dml parallel 8;

Session altered.

Elapsed: 00:00:00.01

SQL> alter session set db_file_multiblock_read_count = 128;

Session altered.

Elapsed: 00:00:00.00

SQL> update /*+ full(t1) parallel(t1,8) */
2 tamil.t1 set object_name = lower(object_name);
3111872 rows updated.

Elapsed: 00:00:28.03
SQL> commit;

Commit complete.

Elapsed: 00:00:01.12
Conclusion:
The parallel update took just 28 seconds versus 9 min 34 seconds in a single process update.
By setting highest value to DB_FILE_MULTIBLOCK_READ_COUNT parameter, "PARALLEL DML"
and "FORCE PARALLEL DML" with correct degree of parallelism at the session level, you can
update a big table in a shorter time.

Do not attempt to process in parallel on a small table. In fact, it will run longer than what a single
process will take.

No comments: