python解压bz2文件命令,在Python中解压缩.bz2文件

python解压bz2文件命令,在Python中解压缩.bz2文件So,thisisaseeminglysimplequestion,butI’mapparentlyveryverydull.Ihavealittlescriptthatdownloadsallthe.bz2filesfromawebpage,butforsomereasonthedecompressingofthatfile…

大家好,又见面了,我是你们的朋友全栈君。

python解压bz2文件命令,在Python中解压缩.bz2文件

So, this is a seemingly simple question, but I’m apparently very very dull. I have a little script that downloads all the .bz2 files from a webpage, but for some reason the decompressing of that file is giving me a MAJOR headache.

I’m quite a Python newbie, so the answer is probably quite obvious, please help me.

In this bit of the script, I already have the file, and I just want to read it out to a variable, then decompress that? Is that right? I’ve tried all sorts of way to do this, I usually get “ValueError: couldn’t find end of stream” error on the last line in this snippet. I’ve tried to open up the zipfile and write it out to a string in a zillion different ways. This is the latest.

openZip = open(zipFile, “r”)

s = ”

while True:

newLine = openZip.readline()

if(len(newLine)==0):

break

s+=newLine

print s

uncompressedData = bz2.decompress(s)

Hi Alex, I should’ve listed all the other methods I’ve tried, as I’ve tried the read() way.

METHOD A:

print ‘decompressing ‘ + filename

fileHandle = open(zipFile)

uncompressedData = ”

while True:

s = fileHandle.read(1024)

if not s:

break

print(‘RAW “%s”‘, s)

uncompressedData += bz2.decompress(s)

uncompressedData += bz2.flush()

newFile = open(steamTF2mapdir + filename.split(“.bz2″)[0],”w”)

newFile.write(uncompressedData)

newFile.close()

I get the error:

uncompressedData += bz2.decompress(s)

ValueError: couldn’t find end of stream

METHOD B

zipFile = steamTF2mapdir + filename

print ‘decompressing ‘ + filename

fileHandle = open(zipFile)

s = fileHandle.read()

uncompressedData = bz2.decompress(s)

Same error :

uncompressedData = bz2.decompress(s)

ValueError: couldn’t find end of stream

Thanks so much for you prompt reply. I’m really banging my head against the wall, feeling inordinately thick for not being able to decompress a simple .bz2 file.

By the by, used 7zip to decompress it manually, to make sure the file isn’t wonky or anything, and it decompresses fine.

解决方案

You’re opening and reading the compressed file as if it was a textfile made up of lines. DON’T! It’s NOT.

uncompressedData = bz2.BZ2File(zipFile).read()

seems to be closer to what you’re angling for.

Edit: the OP has shown a few more things he’s tried (though I don’t see any notes about having tried the best method — the one-liner I recommend above!) but they seem to all have one error in common, and I repeat the key bits from above:

opening … the compressed file as if

it was a textfile … It’s NOT.

open(filename) and even the more explicit open(filename, ‘r’) open, for reading, a text file — a compressed file is a binary file, so in order to read it correctly you must open it with open(filename, ‘rb’). ((my recommended bz2.BZ2File KNOWS it’s dealing with a compressed file, of course, so there’s no need to tell it anything more)).

In Python 2.*, on Unix-y systems (i.e. every system except Windows), you could get away with a sloppy use of open (but in Python 3.* you can’t, as text is Unicode, while binary is bytes — different types).

In Windows (and before then in DOS) it’s always been indispensable to distinguish, as Windows’ text files, for historical reason, are peculiar (use two bytes rather than one to end lines, and, at least in some cases, take a byte worth ‘\0x1A’ as meaning a logical end of file) and so the reading and writing low-level code must compensate.

So I suspect the OP is using Windows and is paying the price for not carefully using the ‘rb’ option (“read binary”) to the open built-in. (though bz2.BZ2File is still simpler, whatever platform you’re using!-).

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/138646.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • Navicat Premium 15 激活码在线获取【2021免费激活】

    (Navicat Premium 15 激活码在线获取)这是一篇idea技术相关文章,由全栈君为大家提供,主要知识点是关于2021JetBrains全家桶永久激活码的内容https://javaforall.net/100143.htmlIntelliJ2021最新激活注册码,破解教程可免费永久激活,亲测有效,上面是详细链接哦~S32PGH0SQB-eyJsaWNlb…

    2022年3月22日
    141
  • 关于矩阵的归一化

    关于矩阵的归一化最近在看Yang大牛稀疏表示论文的代码,发现里面很多的操作的用到了矩阵的列归一化,这里谈一谈列归一化的实现,以及其带来的好处。矩阵的列归一化,就是将矩阵每一列的值,除以每一列所有元素平方和的绝对值,这样做的结果就是,矩阵每一列元素的平方和为1了。举个例子,矩阵[1,2,3]’,将其归一化的结果就是[0.2673,0.5345,0.8018]。其平方和就为1了。Y

    2022年10月11日
    2
  • 10道Hadoop面试真题及解题思路「建议收藏」

    10道Hadoop面试真题及解题思路「建议收藏」(一)海量日志数据,提取出某日访问百度次数最多的那个IP。首先是这一天,并且是访问百度的日志中的IP取出来,逐个写入到一个大文件中。注意到IP是32位的,最多有个2^32个IP。同样可以采用映射的方法,比如模1000,把整个大文件映射为1000个小文件,再找出每个小文中出现频率最大的IP(可以采用hash_map进行频率统计,然后再找出频率最大的几个)及相应的频率。然后再在这100

    2022年6月22日
    22
  • python字符串的使用方法_python输入字符串str

    python字符串的使用方法_python输入字符串strpython字符串常用方法find(sub[,start[,end]])在索引start和end之间查找字符串sub​找到,则返回最左端的索引值,未找到,则返回-1​start和end都可

    2022年7月28日
    3
  • countdown timer plus_Android10使用

    countdown timer plus_Android10使用AndroidCountDownTimer的使用

    2025年12月15日
    7
  • [ubuntu] 查看端口占用[通俗易懂]

    [ubuntu] 查看端口占用[通俗易懂]netstat-ap|grep2200

    2022年7月27日
    9

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号