当前位置: 首页 > news >正文

【流行病学】Melodi-Presto因果关联工具


title: “[流行病学] Melodi Presto因果关联工具”
date: 2022-12-08
lastmod: 2022-12-08
draft: false
tags: [“流行病学”,“因果关联工具”]
toc: true
autoCollapseToc: true

阅读介绍

Melodi-Presto: A fast and agile tool to explore semantic triples derived from biomedical literature1

triples: subject–predicate–object triple

SemMedDB 大型开放式知识库

使用入口

  • 🚩在线工具 Web Application

  • API

  • Jupyter Notebooks

git 下载到json在提取

curl -X POST 'https://melodi-presto.mrcieu.ac.uk/api/overlap/' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "x": [ "diabetes " ], "y": [ "coronary heart disease" ]}' > 1.json

使用示例

X: KRAS 
Y: lung cancer

输入的专业术语应该在Mesh先确定???

文章复现

doi: 10.1093/ije/dyab2032

{{< note >}} 1. 部分内容已经改变 2. Object的挑选精确到chronic 3. Predicate的挑选先无限制 4. Subject的挑选去掉了CRP,但是论文有纳入 5. OR的计算已经去掉? 6. gtf基因和[Uniprot蛋白名库](https://www.uniprot.org/uniprotkb?facets=model_organism%3A9606&query=reviewed%3Atrue)删掉 7. +药物库? {{< /note >}}
library(openxlsx)
# read
df <- read.xlsx("chronic kidney disease.xlsx",
                 sheet = 1,  
                colNames=TRUE,
                check.names=FALSE )

str(df$Pval)
df$Pval <- as.numeric(df$Pval)
# P value < 0.005
df <- subset(df,df$Pval < 0.005 )

# removed triples where the subject was a gene or protein
df$Subject <- tolower(df$Subject)
a=stringr::str_which(df$Subject,
            pattern = "gene|protein|receptor")
# [waring:delete the CRP in the paper]
df$Subject[a]
df <- df[-a,]

# where the term “CAUSES” implies causality, 
#   the term “ASSOCIATED_WITH” implies association, 
#   and the term “COEXISTS_WITH” implies co-existence. 
table(df$Predicate)
df <- subset(df,df$Predicate=="CAUSES"|
               df$Predicate=="ASSOCIATED_WITH"|
               df$Predicate=="COEXISTS_WITH")

# restricted to triples 
# where the object contained either “kidney” or “renal”
table(df$Object)
dplyr::count(df,forcats::fct_lump_n(Object,n=10))
# 
df$Object <- tolower(df$Object)
b=stringr::str_which(df$Object,
                     pattern = "kidney|renal")
df$Object[b]
df <- df[b,]


# removed2 
df$Subject
c=stringr::str_which(df$Subject,
                     pattern = "\\|")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,
                     pattern = "factor")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,
                     pattern = "peptide")
df$Subject[c]
df <- df[-c,]

# retained only unique risk factors (subjects) 
#    to avoid duplicates
df <- dplyr::arrange(df,desc(Count),Pval)
df <- df[!duplicated(df$Subject),]


table(df$Count)
# df <- subset(df,df$Count>2)

write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE)


# enrichment odds ratio
#  (a) count the number of these triples 
#  (b) the number of total triples matched to the query 
#  (c) the total number of these triples in the data base , 
#  (d) and the total number of triples in the database .

# stats.fisher_exact([[a, b-a], [c, d-c]])
  
library(openxlsx)
# read
df <- read.xlsx("chronic kidney disease.xlsx",
                 sheet = 1,  
                colNames=TRUE,
                check.names=FALSE )

str(df$Pval)
df$Pval <- as.numeric(df$Pval)
# P value < 0.005
df <- subset(df,df$Pval < 0.005 )

# removed triples where the subject was a gene or protein
df$Subject <- tolower(df$Subject)
a=stringr::str_which(df$Subject,
            pattern = "gene|protein|receptor")
# [waring:delete the CRP in the paper]
df$Subject[a]
df <- df[-a,]

# where the term “CAUSES” implies causality, 
#   the term “ASSOCIATED_WITH” implies association, 
#   and the term “COEXISTS_WITH” implies co-existence. 
table(df$Predicate)
df <- subset(df,df$Predicate=="CAUSES"|
               df$Predicate=="ASSOCIATED_WITH"|
               df$Predicate=="COEXISTS_WITH")

# restricted to triples 
# where the object contained either “kidney” or “renal”
table(df$Object)
dplyr::count(df,forcats::fct_lump_n(Object,n=10))
# 
df$Object <- tolower(df$Object)
b=stringr::str_which(df$Object,
                     pattern = "kidney|renal")
df$Object[b]
df <- df[b,]


# removed2 
df$Subject
c=stringr::str_which(df$Subject,
                     pattern = "\\|")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,
                     pattern = "factor")
df$Subject[c]
df <- df[-c,]
# 
df$Subject
c=stringr::str_which(df$Subject,
                     pattern = "peptide")
df$Subject[c]
df <- df[-c,]

# retained only unique risk factors (subjects) 
#    to avoid duplicates
df <- dplyr::arrange(df,desc(Count),Pval)
df <- df[!duplicated(df$Subject),]


table(df$Count)
# df <- subset(df,df$Count>2)

write.xlsx(df, file = "筛选4.xlsx", colNames = TRUE)


# enrichment odds ratio
#  (a) count the number of these triples 
#  (b) the number of total triples matched to the query 
#  (c) the total number of these triples in the data base , 
#  (d) and the total number of triples in the database .

# stats.fisher_exact([[a, b-a], [c, d-c]])

NHANES

注意事项, 参考文章复现


  1. doi: 10.1093/bioinformatics/btaa726 ↩︎

  2. Trans-ethnic Mendelian-randomization
    study reveals causal relationships between
    cardiometabolic factors and chronic kidney
    disease ↩︎

http://www.dtcms.com/a/49589.html

相关文章:

  • vim 调整字体
  • vue3中 组合式测试深入组件—事件 与 $emit()
  • 硬件学习笔记--47 LDO相关基础知识介绍
  • wpa_supplicant源码剖析-main.c解析
  • 策略模式的C++实现示例
  • Java基础关键_017_集合(一)
  • 3D手眼标定转换详细实施步骤及原理概述
  • 初始提示词(Prompting)
  • dify智能体之不知道有啥用系列之使用chatflow让selenium打开特定网址
  • 数据序列化协议 Protobuf 3 介绍(Go 语言)
  • e2studio开发RA4M2(17)----ADC扫描多通道采样
  • 基于Java+SpringCloud+Vue的前后端分离的房产销售平台
  • 从小米汽车召回看智驾“命门”:智能化时代 — 时间就是安全
  • 【零基础到精通Java合集】第二十四集:ZGC收集器详解
  • 工业巡检进入‘无人化+AI’时代:无人机智能系统的落地实践与未来
  • 计算机考研复试高频五十问(第一期)
  • ARM处理器的Store Buffer大小有限,内存屏障也无法保障可见性顺序
  • break,continue,goto
  • EB-Cable许可分析的数据来源和采集方法
  • 【高并发】Java 并行与串行深入解析:性能优化与实战指南
  • 【数据库】数据库基础
  • Linux之命令记录【一】
  • HTML第三节
  • 希音(Shein)前端开发面试题集锦和参考答案
  • 【Linux篇】第一个系统程序 - 进度条
  • GradingPool-Seq使用方法
  • day51 shell
  • vue2 + element-ui 开发网站拼图小游戏-前端项目
  • 【一个月备战蓝桥算法】递归与递推
  • map的operator[]的实现