当前位置: 首页 > news >正文

SSCLMD模型代码实现详解

SSCLMD模型代码实现详解

1. 项目源码结构

SSCLMD项目的源码结构如下:

SSCLMD-main/
├── README.md
├── ST4.xlsx
├── Supplementary File.docx
├── code/
│   ├── calculating_similarity.py
│   ├── data_preparation.py
│   ├── data_preprocess.py
│   ├── layer.py
│   ├── main.py
│   ├── parms_setting.py
│   ├── train.py
│   └── utils.py
└── data/├── dataset1.rar└── dataset2.rar

2. 模型核心组件详解

2.1 模型定义(layer.py)

模型在layer.py文件中定义,主要包含以下几个关键类:

  1. Attention类
class Attention(nn.Module):def __init__(self, in_size, hidden_size=128):   # LDA:128 MDA,LMI:16super(Attention, self).__init__()self.project = nn.Sequential(nn.Linear(in_size, hidden_size),nn.Tanh(),nn.Linear(hidden_size, 1, bias=False))def forward(self, z):w = self.project(z)beta = torch.softmax(w, dim=1)return (beta * z).sum(1), beta

这是一个注意力机制的实现,通过计算不同视图的权重,实现对不同视图特征的加权聚合。

  1. GCN类
class GCN(nn.Module):def __init__(self, nfeat, nhid, out, dropout = 0.5):super(GCN, self).__init__()self.gc1 = GCNConv(nfeat, nhid)self.prelu1 = nn.PReLU(nhid)self.gc2 = GCNConv(nhid, out)self.prelu2 = nn.PReLU(out)self.dropout = dropoutdef forward(self, x, adj):x = self.prelu1(self.gc1(x, adj))x = F.dropout(x, self.dropout, training=self.training)x = self.prelu2(self.gc2(x, adj))return x

这是图卷积网络的实现,用于从图结构中提取节点特征。

  1. Discriminator类
class Discriminator(nn.Module):def __init__(self, dim):super(Discriminator, self).__init__()self.fn = nn.Bilinear(dim, dim, 1)def forward(self, h1, h2, h3, h4, c1, c2):c_x1 = c1.expand_as(h1).contiguous()c_x2 = c2.expand_as(h2).contiguous()# positivesc_1 = self.fn(h1, c_x1).squeeze(1)sc_2 = self.fn(h2, c_x2).squeeze(1)# negativesc_3 = self.fn(h3, c_x1).squeeze(1)sc_4 = self.fn(h4, c_x2).squeeze(1)logits = th.cat((sc_1, sc_2, sc_3, sc_4))return logits

这是自监督对比学习的判别器,用于区分正样本和负样本。

  1. SSCLMD类
class SSCLMD(nn.Module):def __init__(self, in_dim, hid_dim, out_dim, decoder1):super(SSCLMD, self).__init__()self.encoder1 = GCN(in_dim, hid_dim, out_dim)self.encoder2 = GCN(in_dim, hid_dim, out_dim)self.encoder3 = GCN(in_dim, hid_dim, out_dim)self.encoder4 = GCN(in_dim, hid_dim, out_dim)self.pooling = AvgReadout()self.attention = Attention(out_dim)self.disc = Discriminator(out_dim)self.act_fn = nn.Sigmoid()self.local_mlp = nn.Linear(out_dim, out_dim)self.global_mlp = nn.Linear(out_dim, out_dim)self.decoder1 = nn.Linear(out_dim * 4, decoder1)self.decoder2 = nn.Linear(decoder1, 1)

这是SSCLMD模型的主要类,整合了编码器、注意力机制、判别器和解码器。

2.2 网络前向传播过程

SSCLMD模型的前向传播过程如下:

def forward(self, data_s, data_f, idx):# 获取特征和图结构feat, s_graph = data_s.x, data_s.edge_indexshuff_feat, f_graph = data_f.x, data_f.edge_index# 结构图和特征图编码h1 = self.encoder1(feat, s_graph)h2 = self.encoder2(feat, f_graph)h1 = self.local_mlp(h1)h2 = self.local_mlp(h2)# 负样本编码h3 = self.encoder1(shuff_feat, s_graph)h4 = self.encoder2(shuff_feat, f_graph)h3 = self.local_mlp(h3)h4 = self.local_mlp(h4)# 额外的编码用于关系预测h5 = self.encoder3(feat, s_graph)h6 = self.encoder3(feat, f_graph)# 全局表示c1 = self.act_fn(self.global_mlp(self.pooling(h1)))c2 = self.act_fn(self.global_mlp(self.pooling(h2)))# 自监督对比学习out = self.disc(h1, h2, h3, h4, c1, c2)# 多视图融合h_com = (h5 + h6)/2emb = torch.stack([h1, h2, h_com], dim=1)emb, att = self.attention(emb)# 根据任务类型选择实体if args.task_type == 'LDA':entity1 = emb[idx[0]]entity2 = emb[idx[1] + 386]if args.task_type == 'MDA':entity1 = emb[idx[0] + 702]entity2 = emb[idx[1] + 386]if args.task_type == 'LMI':entity1 = emb[idx[0]]entity2 = emb[idx[1] + 702]# 多关系建模解码器add = entity1 + entity2product = entity1 * entity2concatenate = torch.cat((entity1, entity2), dim=1)feature = torch.cat((add, product, concatenate), dim=1)log1 = F.relu(self.decoder1(feature))log = self.decoder2(log1)return out, log

3. 数据预处理过程详解

数据预处理主要在data_preprocess.py文件中实现,关键步骤包括:

  1. 数据加载与正负样本构建
positive = np.loadtxt(args.in_file, dtype=np.int64)
link_size = int(positive.shape[0])
np.random.seed(args.seed)
np.random.shuffle(positive)
positive = positive[:link_size]negative_all = np.loadtxt(args.neg_sample, dtype=np.int64)
np.random.shuffle(negative_all)
negative = np.asarray(negative_all[:positive.shape[0]])positive = np.concatenate([positive, np.ones(positive.shape[0], dtype=np.int64).reshape(-1, 1)], axis=1)
negative = np.concatenate([negative, np.zeros(negative.shape[0], dtype=np.int64).reshape(-1, 1)], axis=1)all_data = np.vstack((positive, negative))
  1. 构建K折交叉验证数据集
kf = KFold(n_splits=n_splits, shuffle=True, random_state=args.seed)cv_train_loaders = []
cv_test_loaders = []for train_index, test_index in kf.split(all_data):train_data = all_data[train_index]test_data = all_data[test_index]train_positive = train_data[train_data[:, 2] == 1][:, :2]# 构建邻接矩阵...# 构建数据加载器training_set = Data_class(train_data)train_loader = DataLoader(training_set, **params)test_set = Data_class(test_data)test_loader = DataLoader(test_set, **params)cv_train_loaders.append(train_loader)cv_test_loaders.append(test_loader)
  1. 构建图数据结构
# 构建边索引
edges_s = s_adj.nonzero()
edge_index_s = torch.tensor(np.vstack((edges_s[0], edges_s[1])), dtype=torch.long)edges_f = f_adj.nonzero()
edge_index_f = torch.tensor(np.vstack((edges_f[0], edges_f[1])), dtype=torch.long)# 转换特征为张量
x = torch.tensor(node_feature, dtype=torch.float)
shuf_feature = torch.tensor(shuf_feature, dtype=torch.float)# 创建PyG的Data对象
data_s = Data(x=x, edge_index=edge_index_s)
data_f = Data(x=shuf_feature, edge_index=edge_index_f)

4. 训练过程详解

训练过程在train.py文件中实现,主要包括以下几个步骤:

  1. 模型初始化
model = SSCLMD(in_dim = args.dimensions, hid_dim= args.hidden1, out_dim = args.hidden2, decoder1=args.decoder1)
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)m = torch.nn.Sigmoid()
loss_fct = torch.nn.BCEWithLogitsLoss() 
loss_node = torch.nn.BCELoss()
  1. 训练循环
for epoch in range(args.epochs):t = time.time()print('-------- Epoch ' + str(epoch + 1) + ' --------')y_pred_train = []y_label_train = []lbl_1 = torch.ones(997 * 2)  # dataset1: 997, dataset2: 1071lbl_2 = torch.zeros(997 * 2)lbl = torch.cat((lbl_1, lbl_2)).cuda()for i, (label, inp) in enumerate(train_loader):if args.cuda:label = label.cuda()model.train()optimizer.zero_grad()# 前向传播output, log = model(data_s, data_f, inp)log = torch.squeeze(m(log))# 计算损失loss_class = loss_node(log, label.float())loss_constra = loss_fct(output, lbl)loss_train = loss_class + args.loss_ratio1 * loss_constra# 反向传播loss_train.backward()optimizer.step()# 收集预测结果label_ids = label.to('cpu').numpy()y_label_train = y_label_train + label_ids.flatten().tolist()y_pred_train = y_pred_train + log.flatten().tolist()if i % 100 == 0:print('epoch: ' + str(epoch + 1) + '/ iteration: ' + str(i + 1) + '/ loss_train: ' + str(loss_train.cpu().detach().numpy()))# 计算训练集上的ROC AUCroc_train = roc_auc_score(y_label_train, y_pred_train)print('epoch: {:04d}'.format(epoch + 1),'loss_train: {:.4f}'.format(loss_train.item()),'auroc_train: {:.4f}'.format(roc_train),'time: {:.4f}s'.format(time.time() - t))
  1. 测试过程
def test(model, loader, data_s, data_f, args):m = torch.nn.Sigmoid()loss_fct = torch.nn.BCEWithLogitsLoss()loss_node = torch.nn.BCELoss()# 设置标签lbl_1 = torch.ones(997 * 2)lbl_2 = torch.zeros(997 * 2)lbl = torch.cat((lbl_1, lbl_2)).cuda()inp_id0 = []inp_id1 = []model.eval()y_pred = []y_label = []with torch.no_grad():for i, (label, inp) in enumerate(loader):inp_id0.append(inp[0])inp_id1.append(inp[1])if args.cuda:label = label.cuda()# 前向传播output, log = model(data_s, data_f, inp)log = torch.squeeze(m(log))# 计算损失loss_class = loss_node(log, label.float())loss_constra = loss_fct(output, lbl)loss = loss_class + args.loss_ratio1 * loss_constra# 收集预测结果label_ids = label.to('cpu').numpy()y_label = y_label + label_ids.flatten().tolist()y_pred = y_pred + log.flatten().tolist()outputs = np.asarray([1 if i else 0 for i in (np.asarray(y_pred) >= 0.5)])# 计算评估指标return roc_auc_score(y_label, y_pred), average_precision_score(y_label, y_pred), f1_score(y_label, outputs), loss

5. 主程序流程(main.py)

主程序的流程非常简洁:

# 参数设置
args = settings()# CUDA设置
args.cuda = not args.no_cuda and torch.cuda.is_available()
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.cuda:torch.cuda.manual_seed(args.seed)# 加载数据
data_s, data_f, train_loader, test_loader = load_data(args, n_splits=5)# 对每个fold进行训练和测试
for fold, (train_loader, test_loader) in enumerate(zip(train_loader, test_loader)):print(f"Training on fold {fold+1}")train_model(data_s, data_f, train_loader, test_loader, args)

6. 参数设置(parms_setting.py)

模型参数设置在parms_setting.py中定义,主要包括:

def settings():parser = argparse.ArgumentParser()# 公共参数parser.add_argument('--seed', type=int, default=0,help='Random seed. Default is 0.')parser.add_argument('--no-cuda', action='store_true', default=False,help='Disables CUDA training.')parser.add_argument('--workers', type=int, default=0,help='Number of parallel workers. Default is 0.')# 数据路径参数parser.add_argument('--in_file', default="dataset1/LDA.edgelist",help='Path to data fold. e.g., data/LDA.edgelist')parser.add_argument('--neg_sample', default="dataset1/no_LDA.edgelist",help='Path to data fold. e.g., data/LDA.edgelist')parser.add_argument('--task_type', default="LDA", choices=['LDA', 'MDA','LMI'],help='Initial prediction task type. Default is LDA.')# 训练参数parser.add_argument('--lr', type=float, default=5e-4,help='Initial learning rate. Default is 5e-4.')parser.add_argument('--dropout', type=float, default=0.5,help='Dropout rate. Default is 0.5.')parser.add_argument('--weight_decay', default=5e-4,help='Weight decay (L2 loss on parameters) Default is 5e-4.')parser.add_argument('--batch', type=int, default=25,help='Batch size. Default is 25.')parser.add_argument('--epochs', type=int, default=80,help='Number of epochs to train. Default is 80.')parser.add_argument('--loss_ratio1', type=float, default=0.1,help='Ratio of self_supervision. Default is 1 (LDA), 0.1 (MDA,LMI)')# 模型参数parser.add_argument('--dimensions', type=int, default=512,help='dimensions of feature d. Default is 512 (LDA), 1024 (LDA and LMI)')parser.add_argument('--hidden1', default=256,help='Embedding dimension of encoder layer 1 for SSCLMD. Default is d/2.')parser.add_argument('--hidden2', default=128,help='Embedding dimension of encoder layer 2 for SSCLMD. Default is d/4.')parser.add_argument('--decoder1', default=512,help='Embedding dimension of decoder layer 1 for SSCLMD. Default is 512.')args = parser.parse_args()return args

7. 计算相似性(calculating_similarity.py)

该文件主要用于计算不同类型节点之间的相似性,构建拓扑图的内边关系。

8. 数据准备(data_preparation.py)

该文件用于计算lncRNA/miRNA的k-mer特征并构建基于属性的KNN图。

9. 工具函数(utils.py)

utils.py包含一些辅助函数,如拉普拉斯归一化、行归一化等。

10. 项目复现步骤细节

  1. 环境准备

    • 安装Python 3.7+
    • 安装必要的依赖:numpy, torch, sklearn, torch-geometric
  2. 数据准备

    • 解压data/dataset1.rardata/dataset2.rar
  3. 特征预处理

    • 运行data_preparation.py生成k-mer特征和属性图
    • 运行calculating_similarity.py计算相似性和拓扑图内边
  4. 模型训练与测试

    • 运行main.py启动训练和测试过程
    • 根据需要修改parms_setting.py中的参数
  5. 结果评估

    • 查看输出的AUROC、AUPRC和F1分数
    • 可以保存模型以便后续使用

11. 代码优化建议

  1. 代码模块化:将数据加载、模型定义、训练和测试过程更好地模块化
  2. 参数管理:使用配置文件而不是硬编码的参数值
  3. 日志记录:添加更详细的日志记录,方便调试和分析
  4. 可视化:添加训练过程的可视化,如损失曲线和性能指标变化
  5. 数据并行:对于大规模数据集,添加数据并行处理能力
  6. 模型保存:添加定期保存模型检查点的功能
  7. 早停策略:实现早停策略,避免过拟合

相关文章:

  • ubantu安装CUDA
  • 园区网的发展
  • tinyrenderer笔记(法线映射)
  • 龙虎榜——20250506
  • AI教你学VUE——Deepseek版
  • Matlab/Simulink的一些功能用法笔记(4)
  • 区块链交易所开发:开启数字交易新时代
  • 【JEECG】BasicTable内嵌Table表格错位
  • MySQL 触发器(Trigger)讲解
  • LeetCode 790 多米诺和托米诺平铺 题解
  • Latex排版问题:图片单独占据一页
  • 【网络原理】IP协议
  • vmware虚拟机克隆
  • 聊天助手提示词调优案例
  • 代码随想录算法训练营第九天 |【字符串】151.翻转字符串里的单词、卡码网55.右旋转字符串、28.实现strStr、459.重复的子字符串
  • 星纪魅族新品发布会定档5月13日,Note 16系列战神归来
  • 第七章,VLAN技术
  • 驱动开发系列57 - Linux Graphics QXL显卡驱动代码分析(四)显示区域更新
  • C#中从本地(两个路径文件夹)中实时拿图显示到窗口中并接收(两个tcp发送的信号)转为字符串显示在窗体中实现检测可视化
  • 【C语言】推箱子小游戏
  • 姜再冬大使会见巴基斯坦副总理兼外长达尔
  • 马上评|持续对标国际一流,才有22项“全球最优”
  • 外交部:印巴都表示不希望局势升级,望双方都能保持冷静克制
  • 上海明后天将迎强风大雨,陆地最大阵风7~9级
  • 中方对原产印度进口氯氰菊酯实施反倾销措施,商务部回应
  • 默茨在第二轮投票中当选德国总理