Selecting variants from a VCF file
1
0
Entering edit mode
9.8 years ago

I have generated a VCF file as an output from the GATK UnifiedGenotyper. However, I have quite a lot of missing data in my data set. Does anyone know a way of selecting SNPs that are, say, represented by at least 80% of my samples and therefore excluding any that are below this?

Also, I have multiple SNPs per contig, I would like to get a set of SNPs where there is only one per contig to reduce the effects of linkage. Does anyone know a way of randomly selecting one SNP per contig or, for example, selecting the SNP with best coverage/quality score per contig to leave me with a dataset where each SNP is from a separate contig?

Thanks

variant SNP VCF next-gen • 5.3k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks Pierre

ADD REPLY
0
Entering edit mode

Sorry but I am curious did you finally found a way to randomly sample one snp per ID (contig) from a vcf file?

Thanks

ADD REPLY
1
Entering edit mode
9.8 years ago
Pablo ★ 1.9k

"SnpSift filter" is quite powerful

ADD COMMENT

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6