Many SNPs getting excluded during PRSice 2 analysis
1
0
Entering edit mode
7 weeks ago
m.shoaib • 0

Hello I am running PRSice 2, But there are many variants info which is not included during the process, My code, Rscript PRSice.R\ --prsice PRSice_linux\ --dir /home/projects/base-files/T2test/PRSice\ --base T2baseNoBMI.uniq.txt\ --target contn.analysis.PRS#\ --pheno T2.train.control60.txt\ --cov covariates.cont60.txt\ --binary-target T\ --snp rsID\ --chr CHR\ --bp BP\ --A1 A1\ --A2 A2\ --stat BETA\ --pvalue Pvalue\ --extract PRSice.valid\ --print-snp\ --score sum\ --out PRSice-res\

My log file suggest that 1476197 SNPs are not being included,

5965221 variant(s) observed in base file, with: 1310476 variant(s) excluded based on user input 4654745 total variant(s) included from base file

Loading Genotype info from target

187786 people (86770 male(s), 101016 female(s)) observed 187786 founder(s) included

1476197 variant(s) not found in previous data 226 variant(s) with mismatch information 4654745 variant(s) included

The first column in my base file has effect allele column A1 and second is reference allele A2. In the target data which is in the form of bed bim fam, the first column is ref allele and the second one effect allele. Does this causing the exclusion of so many SNPs ? How to fix this

Or is there something else I am missing

Thanks a lot

linux 2 UKBB PRSice • 317 views
ADD COMMENT
2
Entering edit mode
7 weeks ago
Sam ★ 4.2k

Simply put, the rsid or snp id in your GWAS was not found in your target. There’s still a very healthy amount of overlap, so I won’t be too concern

ADD COMMENT
0
Entering edit mode

Ok, so you think excluding 1.5 million varaint wont make much difference since I already have almost 4.5 million varaint considered by PRSice. Actually, I considered external GWAS that has only chromosome and base position columns. In order to get rsIDs I matched base positions from my target data with rsIDs (from UKBB). So, base and target should have same number of matching rsIDs but I am still confused why PRSice is unable to process milllion of SNPs in my analysis.

ADD REPLY
0
Entering edit mode

So your base data has chr:bp format, and your target has rsid. from the number of SNPs. You can check the overlap between the two using R

library(data.table)
base <- fread("T2baseNoBMI.uniq.txt")
target <- NULL
for(i in paste("contn.analysis.PRS",1:22,".bim", sep="")){
    target <- rbind(target, fread(i))
}
target[,chrID:= paste(V1, V4, sep=":")]

table(target[,chrID], base[,rsID])
table(base[,rsID], target[,chrID] )

and see what numbers you are getting. I am assuming your rsID column in your base is the original ID.

ADD REPLY

Login before adding your answer.

Traffic: 1872 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6