当前位置：首页 > news >正文

PartitionFinder2 安装与使用-bioinfomatics tools 051

news 来源：原创 2025/7/1 5:49:55

1. 引言

PartitionFinder2 是目前针对大中型数据集（核苷酸、氨基酸、形态数据）最理想的分区检测和进化模型选择工具。其推演的最优进化模型结果与 jModelTest2（核苷酸）和 ProTest3（氨基酸）的结果较为接近。

2. 官网

https://www.robertlanfear.com/partitionfinder/

PartitionFinder2 作者 Rob Lanfear 建议：

AICc 是进化模型选择的最佳度量，不推荐使用 AIC（出于历史原因仍保留）。
rcluster 算法优于 hcluster，应尽量避免使用 hcluster。

3. 安装 PartitionFinder2

3.1 安装 Python

PartitionFinder2 需要 Python 2.7.10 或更高版本，但不支持 Python 3.x。
推荐使用 Anaconda 进行安装（下载地址：http://continuum.io/downloads）。
若不使用 Anaconda，请确保 Python 版本 ≥2.7.10，并安装以下依赖包：
```
pip install numpy pandas tables pyparsing scipy sklearn
```
注意：安装的是 tables 而不是 pytables。

3.2 安装 PartitionFinder2

从 PartitionFinder2 官网下载最新版本并解压。

https://github.com/brettc/partitionfinder/releases/latest

移动解压后的文件夹至所需目录，无需额外安装。

4. 入门使用

4.1 序列数据格式

支持 PHYLIP 格式，序列名称字符长度范围 1-100。
数据文件需与 partition_finder.cfg 配置文件位于同一文件夹。

4.2 配置文件 (`partition_finder.cfg`)

PartitionFinder2 的核心配置文件 partition_finder.cfg 需严格按照格式编写。示例如下：

# 序列数据文件
alignment = test.phy;

# 枝长估计方式（linked / unlinked）
branchlengths = linked;

# 进化模型选择（all / allx / mrbayes / beast / gamma / gammai / <list>）
models = GTR, GTR+G, GTR+I+G;

# 模型选择标准（AIC / AICc / BIC）
model_selection = AICc;

# 数据分区
[data_blocks]
Gene1_pos1 = 1-789\3;
Gene1_pos2 = 2-789\3;
Gene1_pos3 = 3-789\3;

# 方案搜索算法（all / user / greedy / rcluster / rclusterf / kmeans）
[schemes]
search = greedy;

4.3 重要参数说明

alignment = test.phy：指定序列数据文件。
branchlengths = linked：设定枝长估计方式。
models = all：设定进化模型范围，可选 allx 以最大似然估计碱基/氨基酸频率。
model_selection = AICc：选择用于模型选择的标准。
[data_blocks]：设定数据分区，通常基于基因和密码子位置划分。
[schemes] search = greedy：设定分区搜索算法，推荐 greedy（贪婪搜索）或 rcluster（松弛聚类）。

5. PartitionFinder2 实战

5.1 运行 PartitionFinder2（小型数据集，约 10 个基因座）

核苷酸数据集（使用 PhyML）：

python PartitionFinder.py /path/to/nucleotide_data

氨基酸数据集（使用 RAxML）：

python PartitionFinderProtein.py /path/to/aminoacid_data

5.2 运行 PartitionFinder2（大型数据集，约 100 个基因座）

设置配置文件：

branchlengths = linked;
models = all;
model_selection = AICc;
search = greedy;

运行 PartitionFinder2（使用 RAxML 加速）：

python PartitionFinder.py /path/to/nucleotide_data --raxml
python PartitionFinderProtein.py /path/to/aminoacid_data --raxml

优化氨基酸数据集计算速度（减少模型数量）：
```
models = LG, LG+G, LG+I+G, LG+I+G+F, LG4X;
```

5.3 运行 PartitionFinder2（超大数据集，约 1000 个基因座）

设置配置文件：

branchlengths = linked;
models = all;
model_selection = AICc;
search = rcluster;

运行 PartitionFinder2（使用松弛聚类）：

python PartitionFinder.py /path/to/nucleotide_data --raxml
python PartitionFinderProtein.py /path/to/aminoacid_data --raxml

优化计算速度（降低 rcluster-max 参数）：

python PartitionFinder.py /path/to/nucleotide_data --raxml --rcluster-max 100

5.4 运行 PartitionFinder2（形态学数据集）

设置配置文件：

branchlengths = linked;
models = multistate+G;
model_selection = AICc;
search = kmeans;

运行 PartitionFinder2（形态学数据集）：

python PartitionFinder.py /path/to/morphology_data --raxml

5.5 附注

user_tree_topology 选项：允许用户提供固定的系统发育树，避免软件默认生成邻接法树。
branchlengths = unlinked 适用于 MrBayes、BEAST、RAxML 等支持不相关枝长估计的软件。
形态学数据模型：
- BINARY+G（二进制数据）
- MULTISTATE+G（多状态数据，MK 模型）
- +A 选项用于偏倚校正

PartitionFinder2 是一个强大的分区和模型选择工具，适用于核苷酸、氨基酸和形态学数据。通过合理选择分区方案和搜索算法，可以提高分析效率。建议在大数据集时使用 RAxML 结合 rcluster 算法，以平衡计算速度和准确性。

6. 引用

Lanfear, R., Frandsen, P. B., Wright, A. M., Senfeld, T., Calcott, B. (2016)
PartitionFinder 2: new methods for selecting partitioned models of evolution for
molecular and morphological phylogenetic analyses. Molecular biology and evolution.
DOI: dx.doi.org/10.1093/molbev/msw260.
Lanfear, R., Calcott, B., Kainer, D., Mayer, C., & Stamatakis, A. (2014). Selecting
optimal partitioning schemes for phylogenomic datasets. BMC evolutionary
biology, 14(1), 82.
Frandsen, P. B., Calcott, B., Mayer, C., & Lanfear, R. (2015). Automatic selection of
partitioning schemes for phylogenetic analyses using iterative k-means clustering of site
rates. BMC Evolutionary Biology, 15(1), 13.