Distribution of assayed SNPs per sample
0
0
Entering edit mode
5 days ago
am29 ▴ 60

I received a plink/vcf file with a lot of samples genotyped with many different SNP chips, both in size (varying from 50K to 1.5 million) and in platforms (different companies).

I need to find common SNPs across samples.

The file itself has SNP IDs in the format of CHR:BP, therefore I cannot use this to infer which SNP comes from which platform. According to my logic, one could do this by filtering out SNPs with missing genotyping calls (./.), however when I did this, I ended up having very small number of SNPs in common. Also, some individuals might be genotyped for some SNP, but ended up having missed call, so I think this is not a good way to do it. I tried PLINK's --missing command which reports the overall missing genotype calls per individual and sample. This is informative, however, I need to know exact SNPs that are common across individuals.

Is there a way to find this out?

distribution PLINK missingness • 412 views
ADD COMMENT
0
Entering edit mode

if you can share some part of the data that would be helpful what you are trying and what the final output expected , that helps for people who can troubleshoot

ADD REPLY
0
Entering edit mode

The vcf file looks like this:

CHR   BP      SNP ID   GENOTYPE    SAMPLE ID
1     5234    1:5234    ./.        SAMPLE_1
ADD REPLY
0
Entering edit mode

According to my logic, one could do this by filtering out SNPs with missing genotype calls (./.), however when I did this, I ended up having very small number of SNPs in common.

It sounds like you already have the answer? There are very few SNP that are genotyped in all samples. Or perhaps I am missing something?

ADD REPLY
0
Entering edit mode

I am interested in whether the reason might not be that individuals are genotyped with different SNP chips but that genotype calls for genotyped SNPs are missing. In both cases I would see ./. in genotype column, as I see now, right? How do I know whether someone is not genotyped for some SNP and because of that there is a missing call (./.), or whether the missing genotype call (./.) is due to the low quality of genotyping (for example) when in reality the sample is genotyped for the SNP. Is there a way to find this from vcf file?

ADD REPLY

Login before adding your answer.

Traffic: 3167 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6