当前位置：首页 > news >正文

SEU-project1项目调试过程记录

news 2025/9/23 10:29:51

从源代码构建

从源代码构建On-vHLL-Tailcut并将其安装在Ubuntu Linux和macOS上。虽然这些说明可能适用于其他系统，但仅针对Ubuntu和macOS进行了测试和支持。

1.安装 ipsumdump

在Ubuntu上，请访问ipsumdump网页获取安装说明。

ipsumdump：

https://gitee.com/link?target=https%3A%2F%2Fread.seas.harvard.edu%2F~kohler%2Fipsumdump%2F

git clone https://github.com/kohler/click/
cd click

进入click文件夹，配置依赖环境

# 确保依赖完整
sudo apt install -y build-essential libtool automake autoconf pkg-config \libpcap-dev libpcre3-dev libglib2.0-dev libboost-dev

开始配置编译

# 确保依赖完整
sudo apt install -y build-essential libtool automake autoconf pkg-config \libpcap-dev libpcre3-dev libglib2.0-dev libboost-dev# 配置编译参数（最基础版）
./configure# 开始编译
make -j$(nproc)# 安装到系统
sudo make install

查看是否安装成功

运行 Click 测试命令在 click-master 目录里运行：./click --version如果输出 Click 的版本号（例如 Click 2.3.0），就说明安装和编译都成功了。

全局安装

sudo make install
cd ..如果执行了 sudo make install
安装成功后，可以全局运行：
click --version不需要在项目目录下也能调用。

安装

#wget https://read.seas.harvard.edu/~kohler/ipsumdump/ipsumdump-1.86.tar.gz
#tar xzvf ipsumdump-1.86.tar.gz
#cd ipsumdump-1.86
git clone https://github.com/mz0/ipsumdump
cd ipsumdump
./configure
make 
make test
sudo make install
ipsumdump -v
cd ..

如何判断安装成功

怎么验证 ipsumdump 是否安装成功？先确认是否生成了可执行文件：ls -lh ./src/ipsumdump如果文件存在，直接运行版本：./src/ipsumdump --help./src/ipsumdump --version如果想装到系统全局：在 ipsumdump-1.86/ipsumdump 目录下执行：sudo make installipsumdump --help

2.Install Conan

这个项目的工程格式只兼容conan 1.x，不兼容2.x。用下面的命令安装指定版本的conan。到github仓库去查看conan 1.x的最新版本是多少。

pip uninstall conan # remove conan of version 2
pip install conan==1.65.0#检查 Conan 是否可用
conan --version#应该输出：
Conan version 1.65.0

3.Install CMake

CMake是一套开源、跨平台的工具系列，旨在构建、测试和打包软件。我们使用CMake来协助构建项目。

sudo apt install cmake

4.Download source code

git clone https://gitee.com/seu-csqjxiao/code-on-vhll-tailcut.git
cd code-on-vhll-tailcut

5.Preprocess Datasets

CAIDA-2016数据集是由加州大学圣地亚哥分校的互联网测量研究中心（CAIDA）发布的网络流量数据集。这个数据集主要用于研究和分析互联网流量、网络安全威胁（如DDoS攻击）、互联网拓扑等方面。具体来说，CAIDA-2016数据集中的“equinix-chicago”部分涉及在Equinix Chicago（一个关键的互联网交换点）收集到的网络流量。

时间范围：数据集涵盖2016年期间的多个时间段。
采集地点：数据主要来自Equinix Chicago，这个地点是北美地区的一个重要互联网交换点，因此数据具有代表性。
数据类型：数据集包括IP流记录，捕获了数据包的五元组（源IP、目的IP、源端口、目的端口、协议类型）信息。
应用场景：该数据集被广泛应用于网络流量分析、DDoS攻击检测和缓解、互联网拓扑研究等多个研究领域。

这个数据集是研究网络流量和安全的一个重要资源，适用于网络科学、网络安全研究人员和从业者。

在百度网盘中链接中，上传了2016年1月21日的在equinix-chicago采集点的CAIDA数据集。

通过网盘分享的文件：20160121-130000.UTC
链接: https://pan.baidu.com/s/1tdqlALiHWJMRXvTi0kby7Q?pwd=xnwc
提取码: xnwc

依次执行指令

# 进入数据文件夹
cd code-on-vhll-tailcut
cd data#将Downloads文件夹下的20160121-130000.UTC下的pcap.gz，pcap.stats，times.gz的文件移动到caida文件夹
mv ~/Downloads/20160121-130000.UTC/*.pcap.gz caida/
mv ~/Downloads/20160121-130000.UTC/*.pcap.stats caida/
mv ~/Downloads/20160121-130000.UTC/*.times.gz caida/#进入caida文件夹
cd caida#解压压缩包
gunzip *.gz
#将pcap文件移动到pcap文件夹
mv *.pcap ../pcap/返回到data文件夹
cd ..

首先进入data文件夹，从130input_list.txt（或者5input_list.txt，有比较少的pcap文件，运行速度快）中读取需要处理的CAIDA-2016的130 pcap files（或者5 pcap files），然后使用ipsumdump工具从pcap中提取出二元组，最后得到二元组的.dat文件。

#从pcap中提取二元组，生成dat文件
python get_flow.py

get_flow.py脚本

# 使用ipsumdump提取pcap中的五元组/二元组import os
import sys
import getopt# -s srcip -d dstip -S srcport -D dstport -p protocol
COMMAND_5TUPLE_FLAGS = "-s -d -S -D -p"
COMMAND_2TUPLE_FLAGS = "-s -d"OPERATE_COMMAND = "ipsumdump {command_flag} {target_file} | grep -vE '(!|-)' > {save_file}"def operate_2tuple(file_path, save_dir, append_text):'''使用ipsumdump提取二元组:param file_path: pcap文件目录:param save_dir:  dat文件保存目录:param append_text: 后缀名'''file_name = os.path.basename(file_path)command = OPERATE_COMMAND.format(command_flag=COMMAND_2TUPLE_FLAGS, target_file=file_path,save_file=os.path.join(save_dir, file_name + append_text))print(command)os.system(command)def scan_pcap_file(scan_dir, save_dir, append_text=".dat"):for home, dirs, files in os.walk(scan_dir):for file in files:# if file.endswith('.pcap'):  # 只处理.pcap文件operate_2tuple(scan_dir+'/'+file, save_dir, append_text)if __name__ == "__main__":scan_dir = "./pcap"save_dir = "./dat"append_text = ".dat"try:opts, args = getopt.getopt(sys.argv[1:], "hd:o:i:e:")except:print("usage : {} -d scan_dir -o save_dir".format(sys.argv[0]))print("这个程序需要 ipsumdump 提取五元组")for opt in opts:if opt[0] == "-h":print("usage : {} -d scan_dir -o save_dir".format(sys.argv[0]))sys.exit()if opt[0] == "-d":scan_dir = opt[1]if opt[0] == "-o":save_dir = opt[1]scan_pcap_file(scan_dir, save_dir, append_text)

请务必注意项目全局搜索input_list.txt，确认此轮实验用的那几个input_list.txt。

对.dat文件进行排序并去除重复行，然后将结果输出到一个新的文件，文件名是在原文件名基础上添加了.uniq后缀。

#去除重复行
python get_dat_uniq.py

get_dat_uniq.py脚本

import os# 确保 dat 和 uniq 目录存在
if not os.path.exists('dat'):os.makedirs('dat')if not os.path.exists('uniq'):os.makedirs('uniq')#input_list = "./130input_list.txt"
input_list = "./5input_list.txt"# 对.dat文件进行排序并去除重复行，然后将结果输出到一个新的文件，文件名是在原文件名基础上添加了.uniq后缀。
exec_cmd = "sort {dat_dir}/{file}.dat | uniq > {uniq_dir}/{file}.uniq"cnt = 1
with open(input_list, 'r') as fh:for l in fh.readlines():file = l.strip()print(cnt, " ", file)cnt += 1os.system(exec_cmd.format(dat_dir='./dat', file=file, uniq_dir='./uniq'))print("ok")

里面在读取./xxinput_list.txt文件里面的pcap文件，提取其中的的unique的二组元序列，放到./uniq目录下面。

#input_list = "./130input_list.txt"
input_list = "./5input_list.txt"

用于从指定目录中的文件提取源IP (src_ip) 和目的IP (dst_ip) 对，并生成一个记录每个源IP及其关联的不同目的IP数量的.groundtruth文件。

#提取IP+基数估计
python get_dat_groundtruth.py

使用sort命令可将生成的.groundtruth文件按真实的基数进行排序。

cd groundtruth
sort -nk 2 equinix-chicago.dirB.20160121-134700.UTC.anon.pcap.groundtruth > tmp.groundtruth
mv tmp.groundtruth equinix-chicago.dirB.20160121-134700.UTC.anon.pcap.groundtruth

使用cat命令查看排序好后的.groundtruth文件。

tail -100 equinix-chicago.dirB.20160121-134700.UTC.anon.pcap.groundtruth

结果示例如下：

43.252.224.186 1362
198.85.16.215 1367
43.252.224.150 1379
146.14.11.157 1381
43.252.226.38 1410
43.252.226.24 1415
43.252.224.152 1473
43.252.226.43 1541
43.252.224.70 1566
146.2.215.245 1589
43.252.226.17 1596
146.2.215.202 1715
43.252.226.36 1735
146.2.214.222 1738
43.252.226.27 1894
43.252.226.39 2166
146.2.215.128 2344
43.252.226.25 2401
43.43.40.240 2590
43.252.226.19 2628
47.63.71.131 2682
146.2.215.78 3430
163.189.104.246 3808
43.252.226.26 3894
43.252.226.37 4765
43.252.226.6 6293
146.2.215.244 6551
146.2.214.223 6669
178.56.191.111 7761
146.2.215.129 11485
...