No way to disable datapump estimate?

No way to disable datapump estimate?

昨天晚上去客户现场配合一个导入导出作业;这是一套Solaris 10上的10.2.0.1系统,导出采用expdp数据泵工具,需要导出的数据源是一张大小在120G左右的单表,该表存放了最近一年的数据,实际导出时只需要最近三个月的数据,所以使用了QUERY参数,并指定了并行度为2。 该导出作业之前未经测试过,语句也是临时写的,实际执行导出工作时发现在评估阶段(estimate phase)耗费了大约十多分钟的时间,estimate默认使用blocks模式,即通过计算对象段占用的数据库块来预估dumpfile的大小;此外还有statistics模式通过对象的统计信息来推算导出文件的大小:

The block method of estimate is the least accurate. We investigate how many blocks were allocated for the storage segments for the tables specified. If the table had been recently analyzed, the estimate=statistics would be the most accurate. So remember, it is important to generate statistics before calculating the estimate using estimate=statistics method. Additionally, ESTIMATE=BLOCKS method could be inaccurate when: a) The table was created with a much bigger initial extent size than was needed for the actual table data b) Many rows have been deleted from the table, or a very small percentage of each block is used. # When statistics have been explicitly set via the DBMS_STATS package in order to force certain optimizer plans, instead of being directly collected from the data, ESTIMATE=STATISTICS may not report correct results.

但实际上这2中方式在使用QUERY的情况下都无法正确预估导出文件的大小:

[maclean@rh2 udump]$ expdp maclean/maclean directory=dump  tables=estimate_me query=estimate_me:'"where t1<2000"'  estimate=statistics

Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 02 November, 2010 21:28:26

Copyright (c) 2003, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "MACLEAN"."SYS_EXPORT_TABLE_01":  maclean/******** directory=dump tables=estimate_me query=estimate_me:"where t1<2000" estimate=statistics
Estimate in progress using STATISTICS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                     104.7 MB
Total estimation using STATISTICS method: 104.7 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."ESTIMATE_ME"                     32.42 KB    1999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_01 is:
  /s01/dump/expdat.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_01" successfully completed at 21:28:29

/* statistics模式使用统计来判断导出文件大小,比较容易控制 */

begin
 dbms_stats.set_table_stats(ownname => 'MACLEAN',tabname => 'ESTIMATE_ME',numrows => 999999999999999,numblks => 99999999999999999999);
  end;
/

PL/SQL procedure successfully completed.

Estimate in progress using STATISTICS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                     10244 GB
Total estimation using STATISTICS method: 10244 GB

/* 再次导出时数据泵采用了"作假的"统计信息*/

[maclean@rh2 udump]$ expdp maclean/maclean directory=dump  tables=estimate_me query=estimate_me:'"where t1<2000"'  estimate=blocks

Export: Release 10.2.0.4.0 - 64bit Production on Tuesday, 02 November, 2010 21:31:58

Copyright (c) 2003, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
FLASHBACK automatically enabled to preserve database integrity.
Starting "MACLEAN"."SYS_EXPORT_TABLE_01":  maclean/******** directory=dump tables=estimate_me query=estimate_me:"where t1<2000" estimate=blocks
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
.  estimated "MACLEAN"."ESTIMATE_ME"                       192 MB
Total estimation using BLOCKS method: 192 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "MACLEAN"."ESTIMATE_ME"                     32.42 KB    1999 rows
Master table "MACLEAN"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for MACLEAN.SYS_EXPORT_TABLE_01 is:
  /s01/dump/expdat.dmp
Job "MACLEAN"."SYS_EXPORT_TABLE_01" successfully completed at 21:32:01

/* 可以看到以上BLOCKS method预估的dumpfile大小为192MB,而STATISTICS方式预估为104.7MB;
    然而实际的使用QUERY参数导出的1999行数据只占用32k的空间!
    estimate在这里摆了一个大乌龙
*/

查了一下metalink发现10.2.0.1上存在expdp在estimate评估阶段耗时很久的bug:

EXPDP Slow and Estimate Takes Long Time [ID 822283.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
This problem can occur on any platform.
Symptoms
Data pump Export is very slow and long time is spent in Estimating the data size.
EXPDP is stuck for long at :

Starting "SYSTEM"."SYS_EXPORT_SCHEMA_01": system/******** parfile=dpexp.par
Estimate in progress using BLOCKS method...

AWR report taken when EXPDP is running slow show the below query took long time to complete .

SELECT /*+rule*/ SYS_XMLGEN(VALUE(KU$),
XMLFORMAT.createFormat2('TABLE_DATA_T', '7')), 0 ,KU$.BASE_OBJ.NAME
,KU$.BASE_OBJ.OWNER_NAME ,'TABLE' ,to_char(KU$.BYTES_ALLOC) ,
to_char(KU$.ET_PARALLEL) ,KU$.FGAC ,KU$.NONSCOPED_REF ,KU$.XMLSCHEMACOLS
,KU$.NAME , KU$.NAME ,'TABLE_DATA' ,KU$.PART_NAME ,KU$.SCHEMA_OBJ.OWNER_NAME
,KU$.TS_NAME , KU$.TRIGFLAG
,decode(KU$.SCHEMA_OBJ.TYPE_NUM,2,decode(bitand(KU$.PROPERTY,8192),8192,
'NESTED TABLE','T'),19,'PARTITION',20,'PARTITION','SUBPARTITION')
,to_char(KU$.UNLOAD_METHOD)
FROM SYS.KU$_TABLE_DATA_VIEW KU$
WHERE NOT BITAND(KU$.BASE_OBJ.FLAGS,128)!=0 AND NOT (BITAND
(KU$.BASE_OBJ.FLAGS,16)=16) AND KU$.BASE_OBJ.NAME IN
('PA_DRAFT_INVOICES_ALL') AND KU$.BASE_OBJ.OWNER_NAME IN ('PA') AND NOT
EXISTS (SELECT 1 FROM SYS.KU$NOEXP_TAB A WHERE A.OBJ_TYPE='TABLE' AND
A.NAME=KU$.BASE_OBJ.NAME AND A.SCHEMA=KU$.BASE_OBJ.OWNER_NAME) AND NOT EXISTS
(SELECT 1 FROM SYS.KU$NOEXP_TAB A WHERE A.OBJ_TYPE='SCHEMA' AND
A.NAME=KU$.BASE_OBJ.OWNER_NAME);

Cause
The issue here is with using RBO.
Data pump is not choosing Cost Based Optimizer.
This is because of the 'unpublished' bug 5929373 -DATA PUMP TAKES LONG TIME TO EXPORT DATA

This bug is closed as duplicate of another bug << 5464834 >>

Solution

   1. Upgrade to 10.2.0.4 . The bug is fixed in 10.2.0.4
      Or
   2. Apply the << patch 5464834 >> if available for your platform .

        * Login to metalink
        * Go to 'Patches and Updates'
        * Click on 'Simple Search'
        * Enter the patch number 5464834
        * Select your platform from the dropdown list .
        * Patch,if available , will be displayed .
        * Download and apply the patch .

该Bug可以通过升级版本解决,但远水解不了近火,最初想disable掉estimate功能;但查阅若干资料后发现目前似乎没有能够关掉estimate功能的办法,这个就是悲哀所在了!耐心等estimate阶段完成后,实际数据导出阶段倒也只花了十来分钟,就这一点看来datapump很有中国式的办事风格。 另外当指定的导出文件数小于指定的parallelism,且导出数据量较大时可能出现ORA-39095错误,会导致导出作业意外终止:

39095, 00000, "Dump file space has been exhausted: Unable to allocate %s bytes"
// *Cause:  The Export job ran out of dump file space before the job was
//          completed.
// *Action: Reattach to the job and add additional dump files to the job
//          restarting the job.

EXPDP generating ORA-39095 : " dump file space has been exhausted"
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2
Information in this document applies to any platform.
Symptoms
Checked for relevance on 10-22-2010

Errors while performing Datapump Export:

- when exporting a large number of tables then it completely stops
- the jobs may remain in the datapump export tables
- No trace generated at the time of the failure.
ERROR:
ORA - 39095 : " dump file space has been exhausted. Unable to allocate 4096 bytes".jobsystem.sys_export_full_02 stops due to fatal error.

Cause
Customer chose only limited number of files in "dumpfile" parameter with limited "filesize" , i.e.:

   full=Y
   directory=dpump_dir
   dumpfile=full_1.dmp,full_2.dmp,full_3.dmp,full_4.dmp
   filesize=3000000000

This means 4 dump files with filesize 3000000000 byte (2.79 G).

the dumpfile parameter is the issue here.

.
Solution
1- You can use the dynamic format (i.e dumpfile=full_%U.dmp) :

The 'wildcard' specification for the dump file can expand up to 99 files. If 99 files have been generated before the export has completed, it will again return the ORA-39095 error.

2- If this is yet not enough and more files are needed, a workaround would be to speficy a bigger 'filesize' parameter.

3- If this is inconvenient, another option is to use this syntax:

      dumpfile=fullexp%U.dmp, fullexp2_%U.dmp, fullexp3_%U.dmp

which can expand up to 3*99 files.

If encountering problems containing the dump in a single directory using this solution, you may prefer this syntax:

     dumpfile=dmpdir1:fullexp1_%U.dmp, dmpdir2:fullexp2_%U.dmp, dmpdir3:fullexp3_U.dmp

(assuming the 3 directory objects listed above had been already created first).

解决方法很简单就是去掉并行度,或者指定动态的导出文件名。 记以录之,回去补觉了!

  • The block method of estimate is the least accurate. We investigate how many blocks were allocated for the storage segments for the tables specified. If the table had been recently analyzed, the estimate=statistics would be the most accurate. So remember, it is important to generate statistics before calculating the estimate using estimate=statistics method. Additionally, ESTIMATE=BLOCKS method could be inaccurate when:a) The table was created with a much bigger initial extent size than was needed for the actual table data b) Many rows have been deleted from the table, or a very small percentage of each block is used.
  • When statistics have been explicitly set via the DBMS_STATS package in order to force certain optimizer plans, instead of being directly collected from the data, ESTIMATE=STATISTICS may not report correct results.
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/110699.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • @RequestParam注解详解

    @RequestParam注解详解@RequestParam是传递参数的.@RequestParam用于将请求参数区数据映射到功能处理方法的参数上。publicStringqueryUserName(@RequestParamStringuserName)在url中输入:localhost:8080/**/?userName=zhangsan请求中包含username参数(如/requestparam1?userName=…

    2025年7月10日
    4
  • 怎么进行大数据测试?我们需要具备怎样的测试能力?「建议收藏」

    怎么进行大数据测试?我们需要具备怎样的测试能力?「建议收藏」前言:现在大数据这么火,那么作为测试人员,我们应该怎么进行大数据测试?需要具备怎样的测试能力?一、大数据测试实现被分成三个步骤(1):数据阶段验证大数据测试的第一步,也称作pre-hadoop阶段该过程包括如下验证:1、来自各方面的数据资源应该被验证,来确保正确的数据被加载进系统2、将源数据与推送到Hadoop系统中的数据进行比较,以确保它们匹配3、验证正确的数据被提取并被加载到HDFS正确的位置该阶段可以使用工具Talend或Datameer,进行数据阶段验证。(2):”MapReduc

    2022年6月4日
    115
  • 手游市场数据分析_梦幻西游手游考古3进4

    手游市场数据分析_梦幻西游手游考古3进4“机”不离手是大多数人的生活状态,甚至有一个用手机形容真爱的段子:“在一起时不看手机,不在一起时秒回信息,就算现代人的真爱了。”随着手机在日常生活中地位的提升,手机游戏也逐渐成为年轻人日常生活不可或缺的一部分。 在7月31日的网易云创大会游戏论坛上,来自TalkingData华南区业务负责人黎丽华给大家带来了“透过数据看真相:手游市场趋势报告”的主题分享。通过一些鲜活的数据向大家展示了…

    2025年6月21日
    3
  • 矩阵向量中两两间欧式距离计算[通俗易懂]

    矩阵向量中两两间欧式距离计算[通俗易懂]目标:希望通过的矩阵运算就能得出矩阵向量中两两之间的欧式距离欧氏距离公式:一般而言,我们常见的欧式距离计算公式如下:a,b对应的是两组不同的向量dist(a,b)=(a1−b1)2+(a2−b2)2+⋅⋅⋅(an−bn)2dist(a,b)=\sqrt{(a_1-b_1)^{2}+(a_2-b_2)^{2}+···(a_n-b_n)^{2}}dist(a,b)=(a1​−b1​)2…

    2022年6月19日
    34
  • PhpSpreadsheet_phpquery手册

    PhpSpreadsheet_phpquery手册phpexcel导出PhpSpreadsheet详解引入正确的文件并实例化usePhpOffice\PhpSpreadsheet\Spreadsheet;$spreadsheet=newSpreadsheet();$worksheet=$spreadsheet->getActiveSheet();赋值//给A2赋值hello$worksheet->…

    2022年9月17日
    3
  • XDOJ1145–组合数学四之Carnival Phantasm

    XDOJ1145–组合数学四之Carnival Phantasm描述:为解救可怜的武内崇老师,saber、远坂、爱尔奎特、希耶尔等人组成了第六科急救队!最终,由琥珀开发出了禁药,分身光线(这药是内服还是外用的==?),将爱尔奎特批量化生产,来对月世界进行全面的地毯式搜索。现已知,第六科共有m个复制人(每个复制人完全一样),月世界有n个城市,每个城市会被一个复制人搜索一遍。问:共有多少种分配方法。(根据时空管理局劳务法更定,每个复制人又要分得工作。)…

    2022年5月7日
    36

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号