【Python爬虫】使用python脚本拉取网页指定小说章节
示例代码说明:
在小说网站选定一本小说,将小说每个章节内容存为txt文档,文件标题与小说章节标题一致
import requests from lxml import etree #一本小说链接 Anovellink = 'https://www.hongxiu.com/book/18899519001291804#Catalog' #目录页代码 ContentsPageCode = requests.get(Anovellink).text #目录页 ContentsPage = etree.HTML(ContentsPageCode) href = ContentsPage.xpath('//*[@id="j-catalogWrap"]/div[2]/div/ul/li/a/@href') for link in href: #链接地址 linkaddress = 'https://www.hongxiu.com' + link #章节页面代码 Chapterpagecode=requests.get(linkaddress).text #章节页面 Chapterpage = etree.HTML(Chapterpagecode) #文字列表 Literallist =Chapterpage.xpath('//div[@class="ywskythunderfont"]/p/text()') #标题 title=Chapterpage.xpath('//h1[@class ="j_chapterName"]/text()')[0] file =open('E:/novelpython/'+title+ '.txt','w',encoding='utf-8') for paragraph in Literallist: file.write(paragraph + '\n') print(title +' Chapter crawling is complete') print('The novel pulling is complete')
结果示例: