python2+selenium爬取笔趣读小说

全栈程序员-站长 • 2025年8月19日下午2:22 • 未分类 • 阅读 5

大家好，又见面了，我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。

Jetbrains全家桶1年46，售后保障稳定

#! /usr/bin/env python
#coding=utf-8

from selenium import webdriver
import time
from bs4 import BeautifulSoup

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

browser = webdriver.Firefox()

#获取文章标题和内容，并写入文档
def get_article():
    title = browser.find_element_by_xpath('//div[@class="bookname"]/h1').text
    print title

    content = browser.find_element_by_id('content').text
    #print content

    with open ('storytudou.txt', 'a') as f:
        f.write(title + '\n')
        f.write(content + '\n\n')

#获取该本小说共有多少章
def page_num():
    browser.get("https://www.biqudu.net/31_31729/")
    html = browser.page_source
    soup = BeautifulSoup(html, 'lxml')
    dd = soup.find_all('dd')
    #print dd    
    page = len(dd)
    return page

#点击下一章
def index_page(i):
    if i == 1:
        browser.get("https://www.biqudu.net/31_31729/2212637.html")
        time.sleep(10)
    get_article()
    js = "window.scrollTo(0,document.body.scrollHeight)"
    browser.execute_script(js)
    time.sleep(5)    
    next_p = browser.find_element_by_xpath('//div[@class="bottem2"]/a[3]')
    #next_p = browser.find_element_by_xpath('/html/body/div/div[5]/div[2]/div[5]/a[3]')
    time.sleep(5)
    next_p.click()
    time.sleep(10)

 #遍历小说全部章节       
def main():
    page = page_num()
    print(page)
    for i in range(1, page+1):
        index_page(i)
        
if __name__ == '__main__':
    
    main()

Jetbrains全家桶1年46，售后保障稳定

系统：ubuntu

需要的安装BeautifulSoup：

yanner@yanner-VirtualBox:~$ sudo apt-get install python-bs4

说明：

Beautiful Soup将复杂HTML文档转换成一个复杂的树形结构,每个节点都是Python对象；

soup.find_all(‘dd’) 获取所有的P标签，返回一个列表，类型为’bs4.element.Tag’。

posted on
2019-08-20 15:01
yanner 阅读(
…) 评论(
…)
编辑收藏

转载于:https://www.cnblogs.com/yanner/p/11382946.html

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/200894.html原文链接：https://javaforall.net

python2+selenium爬取笔趣读小说

相关推荐

1.5 密码学应用PKI体系

php替换中文字符串,php如何替换字符串里的字符「建议收藏」

定义索引长度错误「建议收藏」

porm.xml-ssh[通俗易懂]

DOS 和 Linux 常用命令的对比

PotPlayer快捷键查询[通俗易懂]

发表回复