PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name ObjectPyPDF2编码问题PyPDF2.utils.PdfReadErrorIllegalcharacterinNameObject参考资料:https://github.com/mstamy2/PyPDF2/issues/438使用PyPDF2做合并PDF文件时报错如下:Traceback(mostrecentcalllast):File”D:\pr…

大家好,又见面了,我是你们的朋友全栈君。

PyPDF2 编码问题 PyPDF2.utils.PdfReadError Illegal character in Name Object

参考资料:https://github.com/mstamy2/PyPDF2/issues/438

使用 PyPDF2 做合并 PDF 文件时报错如下:

Traceback (most recent call last):
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484, in readFromStream
    return NameObject(name.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcb in position 8: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\projects\myproject\apps\backstage\views\busi_contract_manage_view.py", line 703, in post
    merge_pdf_result = merge_pdf(final_files, pdf_path)
  File "D:\projects\myproject\apps\utils\doc_convert_util.py", line 86, in merge_pdf
    pdf_writer.write(new_file)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
    retval = readObject(self.stream, self)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 579, in readFromStream
    value = readObject(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 60, in readObject
    return NameObject.readFromStream(stream, pdf)
  File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 492, in readFromStream
    raise utils.PdfReadError("Illegal character in Name Object")
PyPDF2.utils.PdfReadError: Illegal character in Name Object

找到对应的报错文件 

File "D:\projects\myproject\venv\lib\site-packages\PyPDF2\generic.py", line 484

第484行 原代码:

try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    # Name objects should represent irregular characters
    # with a '#' followed by the symbol's hex number
    if not pdf.strict:
        warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
        return NameObject(name)
    else:
        raise utils.PdfReadError("Illegal character in Name Object")

在 except 中加入代码 

return NameObject(name.decode('gbk'))

修改后

try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
    try:
        return NameObject(name.decode('gbk'))
    except (UnicodeEncodeError, UnicodeDecodeError) as e:
        # Name objects should represent irregular characters
        # with a '#' followed by the symbol's hex number
        if not pdf.strict:
            warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
            return NameObject(name)
        else:
            raise utils.PdfReadError("Illegal character in Name Object")

修改后仍会报错,需要修改修改另一处

Lib/site-packages/PyPDF2/utils.py 第238行

原代码

r = s.encode('latin-1')
if len(s) < 2:
    bc[s] = r
return r

 修改后代码:

try:
    r = s.encode('latin-1')
except Exception as e:
    r = s.encode('utf-8')
if len(s) < 2:
    bc[s] = r
return r

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/152402.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • SVG可伸缩的矢量图形「建议收藏」

    SVG可伸缩的矢量图形「建议收藏」SVG可伸缩的矢量图形

    2022年4月20日
    50
  • ps插件套装imagenomic磨皮滤镜安装教程「建议收藏」

    ps插件套装imagenomic磨皮滤镜安装教程「建议收藏」为大家推荐一款知名的PS磨皮降噪滤镜套装,ImagenomicProfessionalSuitemac版中包含noiseware降噪插件、portraiture3磨皮滤镜和Realgrain胶片效果滤镜插件,各有各的功能,轻松进行磨皮、添加胶片效果、降噪等操作,小编这里准备了最新的imagenomic滤镜激活成功教程教程,赶紧试试imagenomic磨皮滤镜吧!imagenomic激活成功教程教程…

    2022年7月22日
    10
  • uni-app 103退出和解散群聊(一)

    uni-app 103退出和解散群聊(一)route.js//删除并退出群聊router.post(‘/group/quit’,controller.group.quit);app/controller/group.js’usestrict’;constController=require(‘egg’).Controller;classGroupControllerextendsController{//获取群聊列表asynclist(){const{ct.

    2022年5月19日
    94
  • 学生选课管理系统_学生管理系统的主要内容

    学生选课管理系统_学生管理系统的主要内容文件下载地址:https://download.csdn.net/download/axiebuzhen/108950621.业务描述设计本系统,模拟学生选课的部分管理功能。学生入校注册后需统一记录学生个人基本信息,对于面向学生开设的相关课程需要记录每门课程的基本信息,每个任课教师规定其可主讲三门课程,学生选课时系统将相应的选课信息记录入库,考试结束后需在相应的选课记录中补上考试成绩。简化…

    2022年10月15日
    4
  • mac版pycharm使用方法_电脑管家mac版是什么意思

    mac版pycharm使用方法_电脑管家mac版是什么意思自定义模版在头部自动生成项目名称、文件所属作者、生成时间参数模版。File–>PreferencesforNewProject–>Editor–>FileandCodeTemplates–>PythonScript–>编辑完后点Apply–>OK#!/usr/bin/envpython#-*-coding…

    2022年8月26日
    4
  • MQTT–入门「建议收藏」

    MQTT–入门「建议收藏」一、简述 MQTT(MessageQueuingTelemetryTransport,消息队列遥测传输协议),是一种基于发布/订阅(publish/subscribe)模式的“轻量级”通讯协议,该协议构建于TCP/IP协议上,由IBM在1999年发布。MQTT最大优点在于,可以以极少的代码和有限的带宽,为连接远程设备提供实时可靠的消息服务。作为一种低开销、低带宽占用的即时通讯协议,使其在物联网

    2022年5月11日
    40

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号