No way to disable datapump estimate?

No way to disable datapump estimate?

昨天晚上去客户现场配合一个导入导出作业;这是一套Solaris 10上的10.2.0.1系统,导出采用expdp数据泵工具,需要导出的数据源是一张大小在120G左右的单表,该表存放了最近一年的数据,实际导出时只需要最近三个月的数据,所以使用了QUERY参数,并指定了并行度为2。 该导出作业之前未经测试过,语句也是临时写的,实际执行导出工作时发现在评估阶段(estimate phase)耗费了大约十多分钟的时间,estimate默认使用blocks模式,即通过计算对象段占用的数据库块来预估dumpfile的大小;此外还有statistics模式通过对象的统计信息来推算导出文件的大小:

The block method of estimate is the least accurate. We investigate how many blocks were allocated for the storage segments for the tables specified. If the table had been recently analyzed, the estimate=statistics would be the most accurate. So remember, it is important to generate statistics before calculating the estimate using estimate=statistics method. Additionally, ESTIMATE=BLOCKS method could be inaccurate when: a) The table was created with a much bigger initial extent size than was needed for the actual table data b) Many rows have been deleted from the table, or a very small percentage of each block is used. # When statistics have been explicitly set via the DBMS_STATS package in order to force certain optimizer plans, instead of being directly collected from the data, ESTIMATE=STATISTICS may not report correct results.

但实际上这2中方式在使用QUERY的情况下都无法正确预估导出文件的大小:

[maclean@rh2 udump]$ expdp maclean/maclean directory=dump  tables=estimate_me query=estimate_me:'"where t1<2000"'  estimate=statistics

Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 02 November, 2010 21:28:26

Copyright (c) 2003, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "MACLEAN"."SYS_EXPORT_TABLE_01":  maclean/******** directory=dump tables=estimate_me query=estimate_me:"where t1<2000" estimate=statistics
Estimate in progress using STATISTICS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                     104.7 MB
Total estimation using STATISTICS method: 104.7 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."ESTIMATE_ME"                     32.42 KB    1999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_01 is:
  /s01/dump/expdat.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_01" successfully completed at 21:28:29

/* statistics模式使用统计来判断导出文件大小,比较容易控制 */

begin
 dbms_stats.set_table_stats(ownname => 'MACLEAN',tabname => 'ESTIMATE_ME',numrows => 999999999999999,numblks => 99999999999999999999);
  end;
/

PL/SQL procedure successfully completed.

Estimate in progress using STATISTICS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                     10244 GB
Total estimation using STATISTICS method: 10244 GB

/* 再次导出时数据泵采用了"作假的"统计信息*/

[maclean@rh2 udump]$ expdp maclean/maclean directory=dump  tables=estimate_me query=estimate_me:'"where t1<2000"'  estimate=blocks

Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 02 November, 2010 21:31:58

Copyright (c) 2003, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "MACLEAN"."SYS_EXPORT_TABLE_01":  maclean/******** directory=dump tables=estimate_me query=estimate_me:"where t1<2000" estimate=blocks
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                       192 MB
Total estimation using BLOCKS method: 192 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."ESTIMATE_ME"                     32.42 KB    1999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_01 is:
  /s01/dump/expdat.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_01" successfully completed at 21:32:01

/* 可以看到以上BLOCKS method预估的dumpfile大小为192MB,而STATISTICS方式预估为104.7MB;
    然而实际的使用QUERY参数导出的1999行数据只占用32k的空间!
    estimate在这里摆了一个大乌龙
*/

查了一下metalink发现10.2.0.1上存在expdp在estimate评估阶段耗时很久的bug:

EXPDP Slow and Estimate Takes Long Time [ID 822283.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
This problem can occur on any platform.
Symptoms
Data pump Export is very slow and long time is spent in Estimating the data size.
EXPDP is stuck for long at :

Starting "SYSTEM"."SYS_EXPORT_SCHEMA_01": system/******** parfile=dpexp.par
Estimate in progress using BLOCKS method...

AWR report taken when EXPDP is running slow show the below query took long time to complete .

SELECT /*+rule*/ SYS_XMLGEN(VALUE(KU$),
XMLFORMAT.createFormat2('TABLE_DATA_T', '7')), 0 ,KU$.BASE_OBJ.NAME
,KU$.BASE_OBJ.OWNER_NAME ,'TABLE' ,to_char(KU$.BYTES_ALLOC) ,
to_char(KU$.ET_PARALLEL) ,KU$.FGAC ,KU$.NONSCOPED_REF ,KU$.XMLSCHEMACOLS
,KU$.NAME , KU$.NAME ,'TABLE_DATA' ,KU$.PART_NAME ,KU$.SCHEMA_OBJ.OWNER_NAME
,KU$.TS_NAME , KU$.TRIGFLAG
,decode(KU$.SCHEMA_OBJ.TYPE_NUM,2,decode(bitand(KU$.PROPERTY,8192),8192,
'NESTED TABLE','T'),19,'PARTITION',20,'PARTITION','SUBPARTITION')
,to_char(KU$.UNLOAD_METHOD)
FROM SYS.KU$_TABLE_DATA_VIEW KU$
WHERE NOT BITAND(KU$.BASE_OBJ.FLAGS,128)!=0 AND NOT (BITAND
(KU$.BASE_OBJ.FLAGS,16)=16) AND KU$.BASE_OBJ.NAME IN
('PA_DRAFT_INVOICES_ALL') AND KU$.BASE_OBJ.OWNER_NAME IN ('PA') AND NOT
EXISTS (SELECT 1 FROM SYS.KU$NOEXP_TAB A WHERE A.OBJ_TYPE='TABLE' AND
A.NAME=KU$.BASE_OBJ.NAME AND A.SCHEMA=KU$.BASE_OBJ.OWNER_NAME) AND NOT EXISTS
(SELECT 1 FROM SYS.KU$NOEXP_TAB A WHERE A.OBJ_TYPE='SCHEMA' AND
A.NAME=KU$.BASE_OBJ.OWNER_NAME);

Cause
The issue here is with using RBO.
Data pump is not choosing Cost Based Optimizer.
This is because of the 'unpublished' bug 5929373 -DATA PUMP TAKES LONG TIME TO EXPORT DATA

This bug is closed as duplicate of another bug << 5464834 >>

Solution

   1. Upgrade to 10.2.0.4 . The bug is fixed in 10.2.0.4
      Or
   2. Apply the << patch 5464834 >> if available for your platform .

        * Login to metalink
        * Go to 'Patches and Updates'
        * Click on 'Simple Search'
        * Enter the patch number 5464834
        * Select your platform from the dropdown list .
        * Patch,if available , will be displayed .
        * Download and apply the patch .

该Bug可以通过升级版本解决,但远水解不了近火,最初想disable掉estimate功能;但查阅若干资料后发现目前似乎没有能够关掉estimate功能的办法,这个就是悲哀所在了!耐心等estimate阶段完成后,实际数据导出阶段倒也只花了十来分钟,就这一点看来datapump很有中国式的办事风格。 另外当指定的导出文件数小于指定的parallelism,且导出数据量较大时可能出现ORA-39095错误,会导致导出作业意外终止:

39095, 00000, "Dump file space has been exhausted: Unable to allocate %s bytes"
// *Cause:  The Export job ran out of dump file space before the job was
//          completed.
// *Action: Reattach to the job and add additional dump files to the job
//          restarting the job.

EXPDP generating ORA-39095 : " dump file space has been exhausted"
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2
Information in this document applies to any platform.
Symptoms
Checked for relevance on 10-22-2010

Errors while performing Datapump Export:

- when exporting a large number of tables then it completely stops
- the jobs may remain in the datapump export tables
- No trace generated at the time of the failure.
ERROR:
ORA - 39095 : " dump file space has been exhausted. Unable to allocate 4096 bytes".jobsystem.sys_export_full_02 stops due to fatal error.

Cause
Customer chose only limited number of files in "dumpfile" parameter with limited "filesize" , i.e.:

   full=Y
   directory=dpump_dir
   dumpfile=full_1.dmp,full_2.dmp,full_3.dmp,full_4.dmp
   filesize=3000000000

This means 4 dump files with filesize 3000000000 byte (2.79 G).

the dumpfile parameter is the issue here.

.
Solution
1- You can use the dynamic format (i.e dumpfile=full_%U.dmp) :

The 'wildcard' specification for the dump file can expand up to 99 files. If 99 files have been generated before the export has completed, it will again return the ORA-39095 error.

2- If this is yet not enough and more files are needed, a workaround would be to speficy a bigger 'filesize' parameter.

3- If this is inconvenient, another option is to use this syntax:

      dumpfile=fullexp%U.dmp, fullexp2_%U.dmp, fullexp3_%U.dmp

which can expand up to 3*99 files.

If encountering problems containing the dump in a single directory using this solution, you may prefer this syntax:

     dumpfile=dmpdir1:fullexp1_%U.dmp, dmpdir2:fullexp2_%U.dmp, dmpdir3:fullexp3_U.dmp

(assuming the 3 directory objects listed above had been already created first).

解决方法很简单就是去掉并行度,或者指定动态的导出文件名。 记以录之,回去补觉了!

  • The block method of estimate is the least accurate. We investigate how many blocks were allocated for the storage segments for the tables specified. If the table had been recently analyzed, the estimate=statistics would be the most accurate. So remember, it is important to generate statistics before calculating the estimate using estimate=statistics method. Additionally, ESTIMATE=BLOCKS method could be inaccurate when:a) The table was created with a much bigger initial extent size than was needed for the actual table data b) Many rows have been deleted from the table, or a very small percentage of each block is used.
  • When statistics have been explicitly set via the DBMS_STATS package in order to force certain optimizer plans, instead of being directly collected from the data, ESTIMATE=STATISTICS may not report correct results.
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/110699.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 【不专一的开发】UML(二)—行为图(状态图、活动图、序列图、协同图)

    【不专一的开发】UML(二)—行为图(状态图、活动图、序列图、协同图)脑图,概念,例子,画图,翻来覆去,深入理解

    2022年5月18日
    39
  • go语言环境安装教程_go语言菜鸟教程

    go语言环境安装教程_go语言菜鸟教程windows下go语言环境搭建安装1、go安装包下载:https://studygolang.com/dl选择自己要下载的版本进行下载。下载好了之后安装一直next就行了(安装路径尽量不要出现中文。)2、安装好了之后打开cmd,输入goversion查一下是否安装成功。出现go版本号,说明安装成功。3、下载liteidehttps://sourceforge.net/pro…

    2022年10月12日
    3
  • nodejs和java多线程_nodeJS和Java哪个难?「建议收藏」

    nodejs和java多线程_nodeJS和Java哪个难?「建议收藏」刚好最近学了一点Java,来回答下这个问题。首先这个问题不好说谁难谁易(就像是问篮球足球谁难),深入学习之后会发现都很难。nodeJS底层是依赖v8跟libuv(c\c++),部分模块是用c++编写,所以深入了解之后会发现还得学c++。而Java将代码编译成字节码运行在虚拟机上,相应的Java字节码、JVM都要去了解。所以研究底层的话两者都很难,不太好区分谁更难。不过从题主的问题来看可能想问的是n…

    2022年7月7日
    31
  • SQL中的聚合函数介绍

    SQL中的聚合函数介绍  什么是聚合函数(aggregatefunction)?聚合函数对一组值执行计算并返回单一的值。 聚合函数有什么特点?除了COUNT以外,聚合函数忽略空值。 聚合函数经常与SELECT语句的GROUPBY子句一同使用。 所有聚合函数都具有确定性。任何时候用一组给定的输入值调用它们时,都返回相同的值。 标量函数:只能对单个的数字或值进行计算。主…

    2022年6月21日
    35
  • RuntimeError: CUDA out of memory(已解决)[通俗易懂]

    今天用pytorch训练神经网络时,出现如下错误:RuntimeError:CUDAoutofmemory.Triedtoallocate144.00MiB(GPU0;2.00GiBtotalcapacity;1.29GiBalreadyallocated;79.00MiBfree;1.30GiBreservedintotalbyPyTorch)明明GPU0有2G容量,为什么只有79M可用?并且1.30G已经被PyTorch占用了。

    2022年4月10日
    2.9K
  • ELK日志原理与介绍

    ELK日志原理与介绍ELK日志原理与介绍

    2022年4月22日
    48

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号