Can I remove all variants in a vcf file that have 0/0 for PRS calculation?
1
0
Entering edit mode
4 months ago
Patrick • 0

I am calculating the PRS from a single sample in a vcf file.

I got the sample by filtering it out of a large dataset from 1000genomes, so there are many variants with 0/0 in it.

My question is now: Can I somehow remove all variants with 0/0 and does removing them affect the PRS score I calculate? Even though its only 1 sample in the file, it is very large which makes it difficult to work with and is possibly even responsible for some errors I am encountering during PRS calculation. (That's why I wanted to remove the variants with 0/0)

PRS vcf • 714 views
ADD COMMENT
0
Entering edit mode

Dear Patrick, please elaborate on what is your 'PRS calculation'. Only then can we assist. Please share relevant code and/or programs that you are using.

ADD REPLY
0
Entering edit mode

I am using the tool 'pgsc_calc' to calculate PRS scores for individuals using the PGS-Catalog.

In terms of code, I don't really have any code since its just a command line tool where you input a vcf file and the PGS-Catalog you want to use and get the PRS for all samples in that vcf. In my case, I am only gonna have 1 sample in the vcf file so 1 PRS score should be calculated

ADD REPLY
0
Entering edit mode

If all of your case/controls are homozygous normal in that site removing the site wouldn't affect the overall scores since it would be 0. However, all the score would shift to larger values if you're dividing the sum by number of non-missing sites like in Plink.

I had this table where you can play around:

https://docs.google.com/spreadsheets/d/1Vm3fAb4TDFOMJOEobX-Ou-tjGliA2Sh65lw1xWQ3RKQ/edit#gid=0

ADD REPLY
0
Entering edit mode
3 months ago

Use bcftools view - e 'COUNT (GT="RR")=0'

( not tested)

ADD COMMENT

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6