当前位置: 首页 > news >正文

AF3 DataPipeline类process_multiseq_fasta 方法解读

AlphaFold3 data_pipeline 模块DataPipeline类的 process_multiseq_fasta 方法用于处理多序列 FASTA 文件,生成 AlphaFold3 结构预测所需的特征,适用于多链复合物的预测。它结合了 Minkyung Baek 在 Twitter 上提出的“AlphaFold-Gap”策略,即通过在多链 MSA 中插入固定长度的 gap 以模拟多链复合物。

源代码:

    def process_multiseq_fasta(self,
                               fasta_path: str,
                               super_alignment_dir: str,
                               ri_gap: int = 200,
                               ) -> FeatureDict:
        """
            Assembles features for a multi-sequence FASTA. Uses Minkyung Baek's
            hack from Twitter (a.k.a. AlphaFold-Gap).
        """
        with open(fasta_path, 'r') as f:
            fasta_str = f.read()

        input_seqs, input_descs = parsers.parse_fasta(fasta_str)

        # No whitespace allowed
        input_descs = [i.split()[0] for i in input_descs]

        # Stitch all of the sequences together
        input_sequence = ''.join(input_seqs)
        input_description = '-'.join(input_descs)
        num_res = len(input_sequence)

        sequence_features = make_sequence_features(
            sequence=input_sequence,
            description=input_description,
            num_res=num_res,
        )

        seq_lens = [len(s) for s in input_seqs]
        total_offset = 0
        for sl in seq_lens:
            total_offset += sl
            sequence_features["residue_index"][total_offset:] += ri_gap

        msa_list = []
        deletion_mat_list = []
        for seq, desc in zip(input_seqs, input_descs):
            alignment_dir = os.path.join(
                super_alignment_dir, desc
            )
            msas = self._get_msas(
                alignment_dir, seq, None
            )
            msa_list.append([m.sequences for m in msas])
            deletion_mat_list.append([m.deletion_matrix for m in msas])

        final_msa = []
        final_deletion_mat = []
        final_msa_obj = []
        msa_it = enumerate(zip(msa_list, deletion_mat_list))
        for i, (msas, deletion_mats) in msa_it:
            prec, post = sum(seq_lens[:i]), sum(seq_lens[i + 1:])
            msas = [
                [prec * '-' + seq + post * '-' for seq in msa] for msa in msas
            ]
            deletion_mats = [
                [prec * [0] +

相关文章:

  • [数据结构笔记] 1. 顺序表 内含所有函数C语言代码,完全无伪代码
  • 使用串口工具实现tcp与udp收发
  • Leetcode2717:半有序排列
  • 【原创工具】同文件夹PDF文件合并 By怜渠客
  • 安全见闻4
  • 多功能免费网络测速及问题诊断工具
  • 【C++】:STL详解 —— vector类
  • C++知识整理day9——继承(基类与派生类之间的转换、派生类的默认成员函数、多继承问题)
  • Linux(Centos 7.6)命令详解:uniq
  • 【愚公系列】《Python网络爬虫从入门到精通》034-DataFrame简单数据计算整理
  • 表单制作代码,登录动画背景前端模板
  • 2025网络安全等级测评报告,信息安全风险评估报告(Word模板)
  • DeepSeek开源周Day2:DeepEP - 专为 MoE 模型设计的超高效 GPU 通信库
  • win11 24h2 远程桌面 频繁断开 已失去连接 2025
  • 通过Python编程语言实现“机器学习”小项目教程案例
  • Ollama微调
  • 猿大师播放器:网页内嵌VLC/FFPlayer在Web端直接播放RTSP/RTMP/H.265视频流
  • (Arrow)试时间处理变得更简单
  • 【Linux Oracle】time命令+oracle exp压缩
  • 分享httprunner 结合django实现平台接口自动化方案