1、注册:LDlink | An Interactive Web Tool for Exploring Linkage Disequilibrium in Population Groups
邮箱会收到12个字符串的token,使用时需要提供token
2、包内方法
LDexpress
Determine if genomic variants are associated with gene expression.
确定基因组变异是否与基因表达相关。
LDhap
Calculates population specific haplotype frequencies of all haplotypes observed for a list of query variants.
计算查询变量列表中观察到的所有单倍型的种群特定单倍型频率。
LDmatrix
Generates a data frame of pairwise linkage disequilibrium statistics.
生成一对联动不平衡统计数据框架
LDpair
Investigates potentially correlated alleles for a pair of variants.
调查一对变异的潜在相关等位基因。
LDpop
Investigates allele frequencies and linkage disequilibrium patterns across 1000 Genomes Project populations.
在1000个基因组计划人群中调查等位基因频率和连锁不平衡模式。
LDproxy
Explore proxy and putative functional variants for a single query variant.
探索单个查询变体的代理和假定的功能变体。
LDproxy_batch
Query LDproxy using a list of query variants, one per line.
使用查询变量列表查询LDproxy,每行一个。
LDtrait
Determine if genomic variants are associated with a trait or disease.
确定基因组变异是否与性状或疾病相关。
list_chips
Provides a data frame listing the names and abbreviation codes for available commercial SNP Chip Arrays from Illumina and Affymetrix.
提供了一个数据框架,列出了来自Illumina和Affymetrix的可用商业SNP芯片阵列的名称和缩写代码。
list_gtex_tissues
Provides a data frame listing the GTEx full names, 'LDexpress' full names (without spaces) and acceptable abbreviation codes of the 54 non-diseased tissue sites collected for the GTEx Portal and used as input for the 'LDexpress' function.
提供一个数据框架,列出GTEx全称、“LDexpress”全称(不含空格)以及为GTEx Portal收集的54个非病变组织位点的可接受缩写代码,这些位点用作“LDexpress”功能的输入。
list_pop
Provides a data frame listing the available reference populations from the 1000 Genomes Project.
提供了一个数据框架,列出了1000个基因组计划中可用的参考种群。
SNPchip
Find commercial genotyping chip arrays for variants of interest.
为感兴趣的变体找到商业基因分型芯片阵列。
SNPclip
Prune a list of variants by linkage disequilibrium.
通过连锁不平衡修剪一组变异。
3、剔除混杂因素主要使用LDtrait
LDtrait(
snps,#
between 1 - 50 variants, using an rsID or chromosome coordinate (e.g. "chr7:24966446"). All input variants must match a bi-allelic variant.
pop = "CEU",#人群,可以使用list_pop()查看支持人群缩写
r2d = "r2",#
use "r2" to filter desired output from a threshold based on estimated LD R2 (R squared) or "d" for LD D' (D-prime), default = "r2".
r2d_threshold = 0.1,#筛选阈值,小的将被筛除
win_size = 5e+05,
token = NULL,#输入邮箱获得token
file = FALSE,
genome_build = "grch37",#在三个选项中任选其一…'grch37'用于基因组构建grch37 (hg19), 'grch38'用于grch38 (hg38), 'grch38_high_coverage'用于grch38 High Coverage (hg38) 1000基因组计划数据集。默认为GRCh37 (hg19)。
api_root = "https://ldlink.nih.gov/LDlinkRest"
)
4、结果展示:
5、解读
rs123位点与Highest math class taken (MTAG)和Educational attainment (MTAG)有关。
6、剔除原理
先检索每个位点相关的特征,满足混杂特征的位点将其剔除。
7、R函数实现
tcL<-function(snps,trait){
as=LDtrait(
snps=snps,
pop = "ALL",
r2d = "r2",
r2d_threshold = 0.1,
win_size = 5e+05,
token = "XXXXXXXX",
file = FALSE,
genome_build = "grch37",
api_root = "https://ldlink.nih.gov/LDlinkRest")
#去除含有trait特征的SNP
gwas_traits <- as[,"GWAS_Trait"]
matches <-sapply(trait, function(x) grep(x, gwas_traits, ignore.case = TRUE))
matches_list <- matches[lapply(matches, length) > 0]
if (length(matches_list) > 0) {
ma <- unique(Reduce(union, matches_list))
} else {
# 如果没有任何匹配项,则可能返回一个空向量或 NA(取决于您的需求)
ma <- integer(0) # 返回一个空的整数向量
}
matches<-as[ma,]
sn1<-unique(matches[,"Query"])
snp1<-snps[!(snps %in% sn1)]
return(snp1)
}
版权归原作者 weixin_49320263 所有, 如有侵权,请联系我们删除。