PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name ObjectPyPDF2编码问题PyPDF2.utils.PdfReadErrorIllegalcharacterinNameObject参考资料:https://github.com/mstamy2/PyPDF2/issues/438使用PyPDF2做合并PDF文件时报错如下:Traceback(mostrecentcalllast):File”D:\pr…

大家好,又见面了,我是你们的朋友全栈君。

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

参考资料:https://github.com/mstamy2/PyPDF2/issues/438

使用 PyPDF2 做合并 PDF 文件时报错如下:

Traceback (most recent call last):
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream
    return NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\projects\myproject\apps\backstage\views\busi_contract_manage_view.py", line 703, in post
    merge_pdf_result = merge_pdf(final_files, pdf_path)
  File "D:\projects\myproject\apps\utils\doc_convert_util.py", line 86, in merge_pdf
    pdf_writer.write(new_file)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 60, in readObject
    return NameObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 492, in readFromStream
    raise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object

找到对应的报错文件 

File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484

第484行 原代码:

try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    # Name objects should represent irregular characters
    # with a '#' followed by the symbol's hex number
    if not pdf.strict:
        warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
        return NameObject(name)
    else:
        raise utils.PdfReadError("Illegal character in Name Object")

在 except 中加入代码 

return NameObject(name.decode('gbk'))

修改后

try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    try:
        return NameObject(name.decode('gbk'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        # Name objects should represent irregular characters
        # with a '#' followed by the symbol's hex number
        if not pdf.strict:
            warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
            return NameObject(name)
        else:
            raise utils.PdfReadError("Illegal character in Name Object")

修改后仍会报错,需要修改修改另一处

Lib/site-packages/PyPDF2/utils.py 第238行

原代码

r = s.encode('latin-1')
if len(s) < 2:
    bc[s] = r
return r

 修改后代码:

try:
    r = s.encode('latin-1')
except Exception as e:
    r = s.encode('utf-8')
if len(s) < 2:
    bc[s] = r
return r

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/152402.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 访问控制列表(ACL)基本的配置以及详细讲解「建议收藏」

    访问控制列表(ACL)基本的配置以及详细讲解「建议收藏」【网络环境】网络时代的高速发展,对网络的安全性也越来越高。西安凌云高科技有限公司因为网络建设的扩展,因此便引入了访问控制列表(ACL)来进行控制,作为网络管理员我们应该怎么来具体的实施来满足公司的需求

    2022年8月6日
    14
  • count(*)、count(主键id)、count(字段)和count(1)等不同用法的性能,有哪些差别?那种效率更高

    count(*)、count(主键id)、count(字段)和count(1)等不同用法的性能,有哪些差别?那种效率更高

    2022年2月17日
    32
  • 谨防qq盗号「建议收藏」

    谨防qq盗号「建议收藏」各位朋友们注意了!最近qq盗号现象频繁,本人的同学与老师近两个月总被盗号,要么是发一个所谓的“好友账号申诉网站”,要么就是下图的二维码千万别扫!不知道有没有投诉成功,安全起见还是不要扫码虽然但是,太假了两个问题:1.copyright还是2005到20202.网址不对,QQ空间的网址是qzone.qq.com3.自己细品不管怎么说,请大家一定要谨防qq盗号,防止信息泄露,许多好友包括老师同学都中招了。记住!不明链接不要轻易打开a祝大家周末愉快…

    2022年6月15日
    35
  • Android Framework中的Application Framework层介绍「建议收藏」

    Android Framework中的Application Framework层介绍「建议收藏」  Android的四层架构相比大家都很清楚,老生常谈的说一下分别为:  Linux2.6内核层,核心库层,应用框架层,应用层。我今天重点介绍一下应用框架层Framework。        Framework层为我们开发应用程序提供了非常多的API,我们通过调用特殊的API构造我们的APP,满足我们业务上的需求。写APP的人都知道,学习Android开发的第一步就是去学习各种各样的API,什…

    2022年10月15日
    0
  • 【转】gcc命令中参数c和o混合使用的详解

    【转】gcc命令中参数c和o混合使用的详解

    2022年3月6日
    30
  • BT渗透「建议收藏」

    BT渗透「建议收藏」PHP交流群:294088839,Python交流群:652376983 whois域名/ip查看域名的详细信息。ping域名/ip测试本机到远端主机是否联通。dig域名/ip查看域名解析的详细信息。host-l域名dns服务器传输zone。扫描nmap:-sS半开扫描TCP和SYN扫描。-sT完全TCP连接扫描。-sUUDP扫描-PSs…

    2022年4月29日
    49

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号