一个简单的多的socket http 下载原型 perl

一个简单的多的socket http 下载原型 perl

基于perl,使用IO::Select实现,并非多线程。可指定分几部分下载。
基本上没有作异常处理,没有处理redirect,甚至也没有判断对range头的响应是否为206.

还好的是它还可以工作,比wget快几倍地下载,挺好玩的.

perl module:

package HttpClient;

use strict;
use warnings;

use IO::Socket::INET;
use Data::Dumper;

my $crlf = “/r/n”;

my $buf_size = 8 * 1024;

sub new {

my $class = shift;
my %cnf = (@_);
my $self = {

state => ‘init’,
url => $cnf{url},
‘total_parts’ => $cnf{’total_parts’},
part => $cnf{part},
‘content_length’ => $cnf{’content_length’},
};
my $url = $self->{url};
my $host = $1 if $url =~ m{://([^/]*)};
my $file = $1 if $url =~ m{/([^/]*)$};
if ( defined $self->{part} ) {
$file .= “.part” . $self->{part};
}
$self->{host} = $host;
$self->{file} = $file;
my $port = 80;
$port = $1 if $host =~ /:(/d+)/;
my $sock = IO::Socket::INET->new(
PeerAddr => $host,
PeerPort => $port,
Proto => ‘tcp’,
Blocking => 0,
)
or die “can’t connect to server:$!/n”;
select($sock);
$| = 1;
select(STDOUT);
$self->{sock} = $sock;
bless $self, $class;
return $self;
}

sub sock {

return shift->{sock};
}

sub get_request_header {

my $self = shift;
return $self->{request} if defined $self->{request};
my $request =
“GET $self->{url} HTTP/1.1$crlf”
. “Host: $self->{host}$crlf”
. “Connection: close$crlf”;
if ( defined $self->{’total_parts’}
and defined $self->{part}
and defined $self->{’content_length’} )
{

my $length = $self->{’content_length’};
my $total_parts = $self->{’total_parts’};
my $part = $self->{part};
my $part_size = int( $length / $total_parts );
my $start_pos = $part_size * $part;
my $recved = 0;
if (-e $self->{file}) {

$recved = -s $self->{file};
$start_pos+=$recved;
}

my $recv_size =
( $part == $total_parts – 1 ) ? $length-$part*$part_size : $part_size;
$self->{start_pos} = $start_pos;
$self->{recv_size} = $recv_size-$recved;
print “part $self->{part} recv_size=$self->{recv_size},start_pos=$start_pos,recved=$recved,parts=$total_parts,length=$length/n”;
$request .=
“Range: bytes=$start_pos-” . ( $start_pos + $recv_size-1 ) . $crlf;
}
$request .= $crlf;
$self->{request} = $request;
return $request;
}

sub parse_header {

my ($self) = @_;
my $data = $self->{data};
return 1 if $self->{state} =~ /body/;
return 0 unless defined $data;
return 0 unless $data =~ m{^(.*?)(/r/n/r/n|/n/n)}s;
my $header_content = $1;
my $header_end = $2;
print $header_content, “/n”;
my @headers = split //r?/n/, $header_content;
die “invalid header/n” unless scalar(@headers) > 0;
my $status_line = shift @headers;
$self->{status_line} = $status_line;
$self->{code} = $2 if $status_line =~ m{HTTP/1(.1)? (/d+)};
my $last_header;
my $header = {};

foreach my $line (@headers) {

if ( $line =~ /^/s+(.*)$/ ) {

$header->{$last_header} .= ” $1″;
}
elsif ( $line =~ /^([^:]+): (.*)$/ ) {

$last_header = $1;
my $value = $2;
$header->{$last_header} = $value;
}
else {

print “invalid header:$line/n”;
}
}
$self->{header} = $header;
$self->{’content_length’} = $header->{’Content-Length’}
unless defined $self->{’content_length’};
$self->{recv_size} = $self->{’content_length’};
$self->{data} = substr($data,length($header_content)+length($header_end));
$self->{state} = ‘body’;
return 1;
}

sub recv_data {

my ( $self, $data ) = @_;
if ( defined $self->{data} ) {

$self->{data} .= $data;
}
else {

$self->{data} = $data;
}
}

sub save_data {

my ( $self, $read_select ) = @_;
my $fh = $self->{fh};
if ( !defined $fh ) {

open $fh, “>$self->{file}” or die “can’t open file $self->{file} :$!/n”;
binmode $fh,”:bytes”;
$self->{fh} = $fh;
}
my $write_len = $self->{write_len} || 0;
my $recv_size = $self->{recv_size};
my $data = $self->{data};
my $max_len = length($data);
return unless $max_len > 0;
if ($max_len+$write_len > $recv_size) {

$max_len = $recv_size – $write_len ;
my $part = $self->{part} || 0;
print “part=$part,max_len=$max_len,write_len=$write_len/n”;
}

if ( $max_len == 0 ) {

$self->{done} = 1;
close $self->{fh};
$read_select->remove( $self->sock );
close $self->{sock};
print “$self->{file} recved $write_len bytes/n”;
$self->{parent}->child_done($self) if $self->{parent};
return;
}
my $len = syswrite( $fh, $data, $max_len )
or die “write data failed :$!/n”;
$self->{data} = substr( $data, $len );
$write_len += $len;
$self->{write_len} = $write_len;
}

sub child_done {

my ( $self, $child ) = @_;
$child->{done} = 1;
return unless $self->{done};
foreach my $c ( @{ $self->{children} } ) {

return unless $c->{done};
}

print “merge file/n”;
open FH, “>>$self->{file}”;
print “first part size:”,-s $self->{file},”/n”;
seek( FH, 0, 2 );
foreach my $c ( @{ $self->{children} } ) {

print “$c->{file} size:”,-s $c->{file},”/n”;
open CFH, “<$c->{file}”;

print “merge $c->{file}/n”;
my $buf;
for ( ; ; ) {

my $len = sysread( CFH, $buf, $buf_size );
last if !defined $len || $len == 0;
syswrite( FH, $buf, $len );
}
close CFH;
unlink $c->{file};
}
close FH;
}

sub handle_read {

my ( $self, $sock, $read_select, $write_select, $sock_client ) = @_;

my $data;
my $len = sysread( $sock, $data, $buf_size );
if ( $len == 0 ) {

print “sock $sock finished/n”;
$read_select->remove($sock);
close $sock;
print “$self->{file} size=”,-s $self->{file},”/n”;
$self->{parent}->child_done($self) if $self->{parent};
return;
}
$self->recv_data($data);
if ( $self->{state} !~ /body/ and $self->parse_header ) {

if ( !defined $self->{parent} and defined $self->{content_length} ) {

my $parts = $self->{total_parts} || 5;
$self->{children} = [];
my $length = $self->{’content_length’};
my $part_size = int( $length / $parts );
$self->{recv_size} = $part_size;
print “parent recv_size=$self->{recv_size}/n”;
foreach my $part ( 1 .. $parts – 1 ) {

my $child = HttpClient->new(
url => $self->{url},
‘total_parts’ => $parts,
part => $part,
‘content_length’ => $self->{content_length},
);
$sock_client->{ $child->sock } = $child;
$child->{parent} = $self;
push @{ $self->{children} }, $child;
$read_select->add( $child->sock );
$write_select->add( $child->sock );
}
}
}
else {

$self->save_data($read_select);
}
}

sub handle_write {

my ( $self, $sock, $read_select, $write_select, $sock_client ) = @_;

my $offset = 0;
$offset = $self->{request_offset} if defined $self->{request_offset};
my $request = $self->get_request_header;
if ( $offset == 0 ) {

print “try to send request/n”;
print $request;
}
print “offset=$offset/n”;
my $len = syswrite( $sock, $request, length($request) – $offset, $offset );
if ( !defined $len ) {

print STDERR “write failed:$!/n”;
$read_select->remove($sock);
$write_select->remove($sock);
}
else {

$offset += $len;
$self->{request_offset} = $offset;
if ( $offset == length($request) ) {

$write_select->remove($sock);
}
}
}

sub start {

my ($self) = @_;
use IO::Select;
my $r = IO::Select->new;
$r->add( $self->sock );

my $w = IO::Select->new;
$w->add( $self->sock );

my $sock_client = { $self->sock => $self };

use Time::HiRes qw(time);
my $start_time = time;
for ( ; ; ) {

last if ( $r->count == 0 );
my ( $rout, $wout, $eout ) = IO::Select->select( $r, $w, $r );
last unless defined $rout;

foreach my $sock ( @{$wout} ) {

my $c = $sock_client->{$sock};
if ( !defined $c ) {

die “oops,can’t find httpclient for $sock/n”;
}
$c->handle_write( $sock, $r, $w, $sock_client );
}
foreach my $sock ( @{$rout} ) {

my $c = $sock_client->{$sock};
if ( !defined $c ) {

die “oops,can’t find httpclient for $sock/n”;
}
$c->handle_read( $sock, $r, $w, $sock_client );
}
}
my $end_time = time;
my $used_time = $end_time – $start_time;
my $speed = $self->{content_length} / $used_time;
print “Done,spend $used_time seconds,speed:$speed bytes/seconds/n”;
}

1;

test perl script:

#!/usr/bin/perl
use strict;
use warnings;

use lib ‘.’;

use HttpClient;
use Getopt::Long;
$| = 1;
#my $url = ‘http://eclipse.cdpa.nsysu.edu.tw/downloads/drops/R-3.2.1-200609210945/eclipse-SDK-3.2.1-linux-gtk.tar.gz’;

my $url = ”;
my $total_parts = 1;
my $result = GetOptions (”url|u=s” => /$url,
                        “parts|p=i”   => /$total_parts,
                        );
unless ($result and $url=~m{://}) {

    print <<HELP
usage: perl http.pl –url=url [–parts=parts]
HELP
;
    exit;
}

my $client = HttpClient->new(url=>$url,’total_parts’=>$total_parts);
$client->start();

参考:

RFC2616 – HTTP/1.1 Specification

technorati tags:perl, http, downloader, protocol

Blogged with Flock

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/100597.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • SVN服务器搭建和使用[通俗易懂]

    SVN服务器搭建和使用[通俗易懂]SVN服务器搭建和使用

    2022年4月24日
    60
  • SQL SERVER 2000数据库置疑 紧急模式

    SQL SERVER 2000数据库置疑 紧急模式SQLSERVER2000数据库,服务重启之后系统库以外的其它库都会出现置疑(置疑之前没有备份的库。除非是置疑前一秒刚备份完,或者是备份完没有再使用,可以直接恢复):1、停掉SQLSERVER服务(开始–控制面板–管理工具–服务–MSSQLSERVER–右键停止)2、找到置疑库的数据文件mdf和ldf存放位置,备份此文件到其它文件夹3、重启SQLSERVER服务

    2022年8月22日
    9
  • UML图绘制—–时序图的画法

    UML图绘制—–时序图的画法UML图绘制—–时序图的画法1.什么是时序图时序图用于描述对象之间的传递消息的时间顺序,即用例中的行为顺序主要用来更直观的表现各个对象交互的时间顺序,将体现的重点放在以时间为参照,各个对象发送、接收消息,处理消息,返回消息的时间流程顺序,也称为时序图。又名序列图、循序图、顺序图,是一种UML交互图2.时序图的作用:确认和丰富一个使用情境的逻辑。3.时序图…

    2022年6月15日
    44
  • 比较好用的mysql可视化工具—–pycharm连接mysql图文教程

    比较好用的mysql可视化工具—–pycharm连接mysql图文教程1.mysql可视化工具常用的mysql可视化工具有很多,如:sqlyog、navicat等等,使用这些工具需要另外安装,有的还可能收费。这里推荐一个比较容易被大家忽略的mysql可视化工具,大多数学python的人,都会使用pycharm,不得不说pycharm的功能及其强大,pycharm本身也是一个功能很强的数据库可视化工具,换句话说,如果安装了pyharm,那根本没必要在安装其他可视化工具了!这里以pycharm链接mysql为例,记录一下详细过程,供大家参考!2.使用工具我使用的工具如下

    2022年8月27日
    6
  • 数据库系统原理——概述「建议收藏」

    数据库系统原理——概述「建议收藏」穷则独善其身,达则兼济天下一.什么是数据库狭义:存储数据的仓库广义:可以对数据进行存储和管理的软件以及数据本身统称为数据库数据库是由表、关系、操作组成二.为什么需要数据库几乎所有的应用软件的后台都需要数据库数据存储数据占用空间小,容易持久保存数据库的内容是存储在硬盘上,掉电之后任然存在存储比较安全软件是加密的,只能通过DBMS打开容易维护和升级使用SQL语句方便操作数据数据库移植比较容易简化对数据的操作为将来学习Oracle做准备.

    2025年6月7日
    2
  • 《人工神经网络原理》读书笔记(六)-Boltzmann机[通俗易懂]

    《人工神经网络原理》读书笔记(六)-Boltzmann机[通俗易懂]全部笔记的汇总贴:《人工神经网络原理》-读书笔记汇总一、随机型神经网络的提出BP和Hopfield网络陷入局部最小点的原因网络误差或能量函数构成了含有多个极小点的非线性超曲面;网络误差或能量函数只能按照梯度下降方向单调变化,而不能有任何上升趋势。随机型神经网络的基本思想不但能够让网络误差或能量函数按照梯度下降方向变化,也能够让它们按照某种方式向梯度上升方向变化,这样才有可能使网络跳出局部极小点而向全局极小点收敛。随机型神经网络的特点神经元的输出状态有概率决定;网络连接权值的调整

    2022年7月15日
    17

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号