当前位置：首页 > news >正文

Python爬取闲鱼价格趋势并可视化分析

news 2025/11/13 9:26:55

一、项目背景与目标

闲鱼作为国内领先的二手交易平台，拥有海量的商品信息和价格数据。这些数据蕴含着丰富的市场信息，但平台本身并不提供直接的价格趋势分析功能。通过Python爬虫技术，我们可以自动化地收集这些数据，并利用数据分析和可视化工具，揭示商品价格的动态变化规律。

本文的目标是实现以下功能：

使用Python爬虫技术爬取闲鱼上特定商品的价格数据。
对爬取的数据进行清洗和预处理。
利用数据可视化工具（如Matplotlib或Seaborn）绘制价格趋势图。
分析价格趋势，为买卖双方提供决策支持。

二、技术选型与工具准备

（一）Python环境搭建

确保已安装Python（推荐使用Python 3.8及以上版本），并安装以下必要的库：

Requests：用于发送HTTP请求，获取网页内容。
BeautifulSoup：用于解析HTML页面，提取所需数据。
Pandas：用于数据处理和分析。
Matplotlib：用于数据可视化，绘制价格趋势图。
Seaborn：用于增强数据可视化效果。

（二）目标网站分析

在开始爬虫之前，需要对闲鱼平台进行分析。闲鱼的商品页面通常包含商品名称、价格、发布时间等信息。通过浏览器开发者工具（F12）查看网页的HTML结构，可以找到价格数据所在的标签和属性。

三、爬虫实现

（一）发送HTTP请求

使用Requests库发送HTTP请求，获取目标商品页面的HTML内容。以下是代码示例：

import requestsdef get_html(url):headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}response = requests.get(url, headers=headers)if response.status_code == 200:return response.textelse:print("请求失败，状态码：", response.status_code)return None

（二）解析HTML页面

使用BeautifulSoup解析HTML内容，提取商品价格数据。以下代码展示了如何提取价格信息：

from bs4 import BeautifulSoupdef parse_html(html):soup = BeautifulSoup(html, 'html.parser')items = soup.find_all('div', class_='item')  # 假设价格信息在class为item的div中prices = []for item in items:price = item.find('span', class_='price').text  # 假设价格在class为price的span中prices.append(price)return prices

（三）数据存储

将爬取到的价格数据存储到Pandas DataFrame中，便于后续分析：

import pandas as pddef save_to_dataframe(prices):df = pd.DataFrame(prices, columns=['Price'])df.to_csv('xianyu_prices.csv', index=False, encoding='utf-8-sig')return df

四、数据可视化

（一）绘制价格趋势图

使用Matplotlib绘制价格趋势图，直观展示价格的动态变化：

import matplotlib.pyplot as plt
import seaborn as snsdef plot_price_trend(df):sns.set(style='whitegrid')plt.figure(figsize=(10, 6))sns.lineplot(data=df, x=df.index, y='Price')plt.title('闲鱼商品价格趋势图')plt.xlabel('时间')plt.ylabel('价格')plt.show()

（二）分析价格波动

通过观察价格趋势图，分析价格的波动规律。例如，某些商品可能在特定时间段内出现价格下降或上涨的趋势，这可能与市场需求、季节性因素或卖家策略有关。

五、完整代码实现

以下是完整的代码实现，从爬取数据到可视化分析的全过程：

import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns# 代理信息
proxyHost = "www.16yun.cn"
proxyPort = "5445"
proxyUser = "16QMSOML"
proxyPass = "280651"# 设置代理
proxies = {"http": f"http://{proxyUser}:{proxyPass}@{proxyHost}:{proxyPort}","https": f"http://{proxyUser}:{proxyPass}@{proxyHost}:{proxyPort}"
}# 获取HTML页面
def get_html(url):headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}try:response = requests.get(url, headers=headers, proxies=proxies)if response.status_code == 200:return response.textelse:print(f"请求失败，状态码：{response.status_code}")return Noneexcept requests.exceptions.RequestException as e:print(f"请求异常：{e}")return None# 解析HTML页面，提取价格数据
def parse_html(html):soup = BeautifulSoup(html, 'html.parser')items = soup.find_all('div', class_='item')  # 假设价格信息在class为item的div中prices = []for item in items:price = item.find('span', class_='price').text  # 假设价格在class为price的span中prices.append(price)return prices# 将价格数据存储到DataFrame
def save_to_dataframe(prices):df = pd.DataFrame(prices, columns=['Price'])df.to_csv('xianyu_prices.csv', index=False, encoding='utf-8-sig')return df# 绘制价格趋势图
def plot_price_trend(df):sns.set(style='whitegrid')plt.figure(figsize=(10, 6))sns.lineplot(data=df, x=df.index, y='Price')plt.title('闲鱼商品价格趋势图')plt.xlabel('时间')plt.ylabel('价格')plt.show()# 主程序
if __name__ == '__main__':url = 'https://xianyu.com/item/123456'  # 替换为实际商品页面URLhtml = get_html(url)if html:prices = parse_html(html)df = save_to_dataframe(prices)plot_price_trend(df)else:print("网页解析失败，可能的原因包括：")print("1. 网页链接可能不合法，请检查链接是否正确。")print("2. 网络问题，可能是代理服务器或网络连接不稳定。")print("3. 网页结构可能发生变化，导致解析失败。")print("建议您检查网页链接的合法性，适当重试。如果问题仍然存在，请联系技术支持。")