Thursday, April 30, 2009

Data pump Process Architecture -Master Table, Worker process

Data Pump jobs use a master table, a master process, and worker processes to perform the work and keep track of the progress. For every Data Pump Export and Data Pump Import job, a master process is created. The master process controls the entire job, including communicating with the clients, creating and controlling a pool of worker processes, and performing logging operations.

Let's see an example.
1)Starting data pump jobs in one session.
expdp schemas=arju userid=arju/a dumpfile=arju_30_04_2009.dmp logfile=arju_20_04_09.log

2)Query from dba_datapump_session to know the data pump job status.

set lines 150 pages 100
col program for a20
col username for a5
col spid for a7
col job_name for a25
select to_char(sysdate,'YYYY-MM-DD HH24:MI:SS') "DATE", s.program, s.sid,
s.status, s.username, d.job_name, p.spid, s.serial#, p.pid
from v$session s, v$process p, dba_datapump_sessions d
where p.addr=s.paddr and s.saddr=d.saddr;

DATE PROGRAM SID STATUS USERN JOB_NAME SPID SERIAL# PID
------------------- -------------------- ------- -------- ----- ------------------------- ------- ------- -------
2009-04-30 16:49:46 expdp.exe 148 ACTIVE ARJU SYS_EXPORT_SCHEMA_01 3164 14 16
2009-04-30 16:49:46 ORACLE.EXE (DM00) 144 ACTIVE ARJU SYS_EXPORT_SCHEMA_01 5376 20 17

SQL> /

DATE PROGRAM SID STATUS USERN JOB_NAME SPID SERIAL# PID
------------------- -------------------- ------- -------- ----- ------------------------- ------- ------- -------
2009-04-30 16:49:50 expdp.exe 148 ACTIVE ARJU SYS_EXPORT_SCHEMA_01 3164 14 16
2009-04-30 16:49:50 ORACLE.EXE (DM00) 144 ACTIVE ARJU SYS_EXPORT_SCHEMA_01 5376 20 17
2009-04-30 16:49:50 ORACLE.EXE (DW01) 141 ACTIVE ARJU SYS_EXPORT_SCHEMA_01 7352 7 20

SQL> /

no rows selected

You see in the above output I ran the same query three times. First one is just after submitting data pump jobs. Second query I ran few seconds after data pump job is submitted. And third one is after data pump job is completed.

In the first output of the query just a master table is created as well as master process(DM00). This master process controls the entire job, including communicating with the clients, creating and controlling a pool of worker processes, and performing logging operations.

In the second output, worker process is created (DW01). Since we did not set any parallelism so by default PARALLEL is 1 (one) and there is only one worker process. If we set PARALLEL to more than 1, then we would see multiple worker process like DW01, DW02 etc. The worker processes are responsible for updating the master table with information on the status (pending, completed, failed) of each object being processed. This information is used to provide the detailed information required to restart stopped Data Pump jobs.

In the third output we see no rows as data pump process is disappeared when the data pump job is completed or it stops.

If you say architecture of data pump job then,
data pump job is initiated after creating Master Table. This table is created at the beginning of a Data Pump operation and is dropped at the end of the successful completion of a Data Pump operation. However if you kill the job by command kill_job in the interactive prompt then the Master Table can also be dropped. If a job is stopped using the stop_job interactive command or if the job is terminated unexpectedly, the Master Table will be retained.

Note that during data pump jobs we can set the keep_master parameter Y which ensure to retain the Master Table at the end of a successful job.

Suppose my datapump job command is,
C:>expdp schemas=arju userid=arju/a dumpfile=arju_30_04_.dmp logfile=arju_20_04_09.log keep_master=y
Export: Release 10.2.0.1.0 - Production on Thursday, 30 April, 2009 17:11:09

Copyright (c) 2003, 2005, Oracle. All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Starting "ARJU"."SYS_EXPORT_SCHEMA_02": schemas=arju userid=arju/******** dumpfile=arju_30_04_.dmp logfile=arju_20_04_0
9.log keep_master=y

As we set keep_master=y so we can see master table "ARJU"."SYS_EXPORT_SCHEMA_02" anytime after data pump jobs is completed.

Structure of a master table is shown below.

SQL> desc "ARJU"."SYS_EXPORT_SCHEMA_02";
Name Null? Type
----------------------------------------------------- -------- ------------------------------------
PROCESS_ORDER NUMBER
DUPLICATE NUMBER
DUMP_FILEID NUMBER
DUMP_POSITION NUMBER
DUMP_LENGTH NUMBER
DUMP_ALLOCATION NUMBER
COMPLETED_ROWS NUMBER
ERROR_COUNT NUMBER
ELAPSED_TIME NUMBER
OBJECT_TYPE_PATH VARCHAR2(200)
OBJECT_PATH_SEQNO NUMBER
OBJECT_TYPE VARCHAR2(30)
IN_PROGRESS CHAR(1)
OBJECT_NAME VARCHAR2(500)
OBJECT_LONG_NAME VARCHAR2(4000)
OBJECT_SCHEMA VARCHAR2(30)
ORIGINAL_OBJECT_SCHEMA VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
FLAGS NUMBER
PROPERTY NUMBER
COMPLETION_TIME DATE
OBJECT_TABLESPACE VARCHAR2(30)
SIZE_ESTIMATE NUMBER
OBJECT_ROW NUMBER
PROCESSING_STATE CHAR(1)
PROCESSING_STATUS CHAR(1)
BASE_PROCESS_ORDER NUMBER
BASE_OBJECT_TYPE VARCHAR2(30)
BASE_OBJECT_NAME VARCHAR2(30)
BASE_OBJECT_SCHEMA VARCHAR2(30)
ANCESTOR_PROCESS_ORDER NUMBER
DOMAIN_PROCESS_ORDER NUMBER
PARALLELIZATION NUMBER
UNLOAD_METHOD NUMBER
GRANULES NUMBER
SCN NUMBER
GRANTOR VARCHAR2(30)
XML_CLOB CLOB
NAME VARCHAR2(30)
VALUE_T VARCHAR2(4000)
VALUE_N NUMBER
IS_DEFAULT NUMBER
FILE_TYPE NUMBER
USER_DIRECTORY VARCHAR2(4000)
USER_FILE_NAME VARCHAR2(4000)
FILE_NAME VARCHAR2(4000)
EXTEND_SIZE NUMBER
FILE_MAX_SIZE NUMBER
PROCESS_NAME VARCHAR2(30)
LAST_UPDATE DATE
WORK_ITEM VARCHAR2(30)
OBJECT_NUMBER NUMBER
COMPLETED_BYTES NUMBER
TOTAL_BYTES NUMBER
METADATA_IO NUMBER
DATA_IO NUMBER
CUMULATIVE_TIME NUMBER
PACKET_NUMBER NUMBER
OLD_VALUE VARCHAR2(4000)
SEED NUMBER
LAST_FILE NUMBER
USER_NAME VARCHAR2(30)
OPERATION VARCHAR2(30)
JOB_MODE VARCHAR2(30)
CONTROL_QUEUE VARCHAR2(30)
STATUS_QUEUE VARCHAR2(30)
REMOTE_LINK VARCHAR2(4000)
VERSION NUMBER
DB_VERSION VARCHAR2(30)
TIMEZONE VARCHAR2(64)
STATE VARCHAR2(30)
PHASE NUMBER
GUID RAW(16)
START_TIME DATE
BLOCK_SIZE NUMBER
METADATA_BUFFER_SIZE NUMBER
DATA_BUFFER_SIZE NUMBER
DEGREE NUMBER
PLATFORM VARCHAR2(101)
ABORT_STEP NUMBER
INSTANCE VARCHAR2(60)

Let's see the number of rows populated inside master table.
SQL> select count(*) from "ARJU"."SYS_EXPORT_SCHEMA_02";

COUNT(*)
--------
1125

We see in the master table the total rows are 1125. In fact master table contains all data pump log messages. It is used to track the detailed progress information of a Data Pump job - which is more than the log messages. It conatins following information.

- Completed rows of a table.

- Total number of errors during data pump operation.

- Elapsed time for each table to do data pump export/import operation.

- The current set of dump files.

- The current state of every object exported or imported and their locations in the dump file set.

- The job's user-supplied parameters.

- The status of every worker process.

- The state of current job status and restart information.

- The dump file location, the directory name information.

And many other useful information.

Related Documents
http://arjudba.blogspot.com/2009/02/important-guideline-about-data-pump.html

1 comment:

  1. This was really very informaive peace of information, hard to find in detail elsewhere.

    Thanks for sharing.

    ReplyDelete