PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name ObjectPyPDF2编码问题PyPDF2.utils.PdfReadErrorIllegalcharacterinNameObject参考资料:https://github.com/mstamy2/PyPDF2/issues/438使用PyPDF2做合并PDF文件时报错如下:Traceback(mostrecentcalllast):File”D:\pr…

大家好,又见面了,我是你们的朋友全栈君。

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

参考资料:https://github.com/mstamy2/PyPDF2/issues/438

使用 PyPDF2 做合并 PDF 文件时报错如下:

Traceback (most recent call last):
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream
    return NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\projects\myproject\apps\backstage\views\busi_contract_manage_view.py", line 703, in post
    merge_pdf_result = merge_pdf(final_files, pdf_path)
  File "D:\projects\myproject\apps\utils\doc_convert_util.py", line 86, in merge_pdf
    pdf_writer.write(new_file)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 60, in readObject
    return NameObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 492, in readFromStream
    raise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object

找到对应的报错文件 

File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484

第484行 原代码:

try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    # Name objects should represent irregular characters
    # with a '#' followed by the symbol's hex number
    if not pdf.strict:
        warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
        return NameObject(name)
    else:
        raise utils.PdfReadError("Illegal character in Name Object")

在 except 中加入代码 

return NameObject(name.decode('gbk'))

修改后

try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    try:
        return NameObject(name.decode('gbk'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        # Name objects should represent irregular characters
        # with a '#' followed by the symbol's hex number
        if not pdf.strict:
            warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
            return NameObject(name)
        else:
            raise utils.PdfReadError("Illegal character in Name Object")

修改后仍会报错,需要修改修改另一处

Lib/site-packages/PyPDF2/utils.py 第238行

原代码

r = s.encode('latin-1')
if len(s) < 2:
    bc[s] = r
return r

 修改后代码:

try:
    r = s.encode('latin-1')
except Exception as e:
    r = s.encode('utf-8')
if len(s) < 2:
    bc[s] = r
return r

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/152402.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • mysql启动命令

    mysql启动命令1、查看mysql版本方法一:status;方法二:selectversion();2、Mysql启动、停止、重启常用命令a、启动方式1、使用service启动:[root@localhost/]#servicemysqldstart(5.0版本是mysqld)[root@szxdbetc]#servicemysqlstart(5.5.7版本是mys

    2022年5月11日
    63
  • C#用什么开发_hbase写数据流程

    C#用什么开发_hbase写数据流程最近公司程序需要嵌入各个浏览器的插件。首先

    2025年5月28日
    2
  • 智能避障小车_单片机智能小车程序

    智能避障小车_单片机智能小车程序      接下来我对所用的模块以及小车的硬件部分做一个讲解        小车的总体效果图如下:      首先是模块简介:            1、首先就是L298N,这是一个经典的电机驱动,相信基本所有玩过单片机,玩过电机的人都使用过,它可以最高容忍15v电压输入,逻辑电平2.4-5.5v,所以使用单片机的3.3v完全可以驱动,它并没有PWM接口来控制电机的速度,只能使逻辑电平输出…

    2022年10月17日
    2
  • XML转换_xml文件转化为excel格式

    XML转换_xml文件转化为excel格式xml文件<?xmlversion=”1.0″encoding=”utf-8″?><ModelMetadataversion=”1″><!–SpatialReferenceSystem–><SRS>EPSG:4326</SRS><!–OrigininSpatialReferenceSystem–><SRSOrigin>1222.02055172,31.

    2022年8月22日
    15
  • 流媒体服务器配置_视频监控流媒体服务器配置

    流媒体服务器配置_视频监控流媒体服务器配置对于普通视频网站来说,并发数量是一个非常有参考价值的数据,在部分时间段,并发数量也许不大,但是也可能短时间内暴涨且没有上限,此时就需要系统具备良好的扩张能力和负载均衡能力。那么如何针对流媒体服务器分发的RTSP流进行并发压力测试了解系统的能力?本分和大家分享一下我们的测试过程。通过使用多路RTSP客户端进行拉流,即可达到并发压力测试。对于RTSP客户端的选择,可以选择开源的OpenRTSP客户端进行拉流测试。OpenRTSP的使用方法如下:1、下载源码wgethttp://www.live5

    2022年10月20日
    4
  • 虚函数详解[通俗易懂]

    虚函数详解[通俗易懂]文章目录一、虚函数实例二、虚函数的实现(内存布局)1、无继承情况2、单继承情况(无虚函数覆盖)3、单继承情况(有虚函数覆盖)4、多重继承情况(无虚函数覆盖)5、多重继承情况(有虚函数覆盖)三、虚函数的相关问题1、构造函数为什么不能定义为虚函数2、析构函数为什么要定义为虚函数?3、如何去验证虚函数表的存在  面向对象的语言有三大特性:继承、封装、多态。虚函数作为多态的实现方式,重要性毋庸置疑。 …

    2022年7月26日
    11

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号