爬虫实战–基于requests 和 Beautiful的7160美图网爬取图片

全栈程序员-站长 • 2026年3月18日下午1:58 • 未分类 • 阅读 2

爬虫实战–基于requests 和 Beautiful的7160美图网爬取图片importreques 初始地址 all url http www 7160 com xiaohua 保存路径 path H school girl 请求头 header User Agent Moz

import requests import os from bs4 import BeautifulSoup import re # 初始地址 all_url = 'http://www.7160.com/xiaohua/' #保存路径 path = 'H:/school_girl/' # 请求头 header = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 UBrowser/6.1.2107.204 Safari/537.36' } #开始请求（多列表）
html = requests.get(all_url,headers = header)
start_html = html.text.encode('iso-8859-1').decode('gbk') # 将gb2312转为UTF-8格式 #开始解析
soup = BeautifulSoup(start_html,'lxml') #查找最大页码 page = 255 # 同一路径 same_url = 'http://www.7160.com/xiaohua/' for n in range(1,int(page)+1): ul = same_url + 'list_6_' + str(n) + '.html' ##开始请求（单列表多元素）
    html = requests.get(ul,headers = header)
    start_html = html.text.encode('iso-8859-1').decode('gbk') #开始解析
    soup = BeautifulSoup(start_html,'lxml') all_a = soup.find('div',class_='news_bom-left').find_all('a',target = '_blank') for a in all_a: title = a.get_text() if title != '': #创建目录 #win不能创建带？的目录 if (os.path.exists(path + title.strip().replace('?', ''))): # print('目录已存在') flag = 1 else: os.makedirs(path + title.strip().replace('?', '')) flag = 0 os.chdir(path + title.strip().replace('?', '')) # END  #开始请求（单元素） print('准备爬取:' + title) hrefs = a['href'] in_url = 'http://www.7160.com' href = in_url + hrefs

            htmls = requests.get(href,headers = header)
            html = htmls.text.encode('iso-8859-1').decode('gbk') ##开始解析#  mess = BeautifulSoup(html,'lxml') titles = mess.find('h1').text pic_max = mess.find('div',class_ = 'itempage').find_all('a')[-2].text # 最大页数 if (flag == 1 and len(os.listdir(path + title.strip().replace('?', ''))) >= int(pic_max)): print('已经保存完毕，跳过') continue for num in range(1,int(pic_max)+1): href = a['href'] hrefs = re.findall(r'.{14}',href) href = "".join(hrefs) if num == 1:
                    html = in_url + href + '.html' else:
                    html = in_url + href + '_' + str(num) + ".html" #开始请求（单元素里的子元素）
                htmls = requests.get(html,headers = header)
                html = htmls.text.encode('iso-8859-1').decode('gbk') ##开始解析#
                mess = BeautifulSoup(html,'lxml') pic_url = mess.find('img',alt = titles) print(pic_url['src']) #开始下载
                html = requests.get(pic_url['src'],headers = header) filename = pic_url['src'].split(r'/')[-1] f = open(filename,'wb')
                f.write(html.content)
                f.close()
            print('完成') print('第',n,'页完成')

打印后的结果为：

转载于:https://www.cnblogs.com/zhuifeng-mayi/p/9712209.html

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请联系我们举报，一经查实，本站将立刻删除。

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/215413.html原文链接：https://javaforall.net

赞 (0)

0 0

关于作者

全栈程序员-站长

133.5K 文章

3 粉丝

本网站汇聚当前互联网主流语音，持续更新，欢迎关注公众号“全栈程序员社区”

SaaS模式、技术与案例详解——第16章 SaaS模式可行性分析

上一篇 2026年3月18日下午1:58

常数变易法的“前世今生”

下一篇 2026年3月18日下午1:59

openclaw

OpenClaw 安装教程：Mac 和 Windows 保姆级部署指南

OpenClaw 安装教程：Mac 和 Windows 保姆级部署指南

全栈程序员-站长
2026年3月13日
2
encodeURIComponent() 函数

encodeURIComponent() 函数

全栈程序员-站长
2021年10月9日
56
OVN Southbound DB简介及其相关命令示例

OVN Southbound DB简介及其相关命令示例SouthboundDB 里面有如下几张表 Chassis chassis 这个概念 Chassis 是 OVN 新增的概念 OVS 里面没有这个概念 chassis 表的每一行表示一个 HV 或者 VTEP 网关由 ovn controller ovn controller vtep 填写包含 chassis 的名字和 chassis 支持的封装的配置指向表 Encap 如

全栈程序员-站长
2026年3月17日
2
java jettison_java – 使用Jettison进行JSON解析

java jettison_java – 使用Jettison进行JSON解析我试图使用 Jettison 解析 JSON 对象这是我正在使用的代码 Strings appUsage appName ANDROID totalUsers 0 appName IOS totalUsers 4 JSONObjectob newJSONObjec s ArrayListl1

全栈程序员-站长
2026年3月19日
2
ssh-server配置文件参数PermitRootLogin介绍

ssh-server配置文件参数PermitRootLogin介绍sshd_config是sshd的配置文件，其中PermitRootLogin可以限定root用户通过ssh的登录方式，如禁止登陆、禁止密码登录、仅允许密钥登陆和开放登陆，以下是对可选项的概括：参数类别是否允许ssh登陆登录方式交互shellyes允许没有限制没有限制without-password允许除密码以外没

全栈程序员-站长
2022年6月11日
70
virsh命令杂记[通俗易懂]

virsh命令杂记[通俗易懂]关机virshshutdowndomain开机virshstartdomain查看virsh的snapshot的帮助virshhelpsnapshotSnapshot(helpkeyword‘snapshot’):snapshot-createCreateasnapshotfromXMLsna

全栈程序员-站长
2022年8月11日
12

发表回复

关注全栈程序员社区公众号