Question: Selecting variants from a VCF file
0
gravatar for matt.christmas85
6.7 years ago by
Australia
matt.christmas8510 wrote:

I have generated a VCF file as an output from the GATK UnifiedGenotyper. However, I have quite a lot of missing data in my data set. Does anyone know a way of selecting SNPs that are, say, represented by at least 80% of my samples and therefore excluding any that are below this?

Also, I have multiple SNPs per contig, I would like to get a set of SNPs where there is only one per contig to reduce the effects of linkage. Does anyone know a way of randomly selecting one SNP per contig or, for example, selecting the SNP with best coverage/quality score per contig to leave me with a dataset where each SNP is from a separate contig?

Thanks

snp variant next-gen vcf • 4.3k views
ADD COMMENTlink modified 3.7 years ago by panatheod0 • written 6.7 years ago by matt.christmas8510
1

see Reliable Tools To Filter Vcf Format Files

ADD REPLYlink written 6.7 years ago by Pierre Lindenbaum134k

Thanks Pierre

ADD REPLYlink written 6.7 years ago by matt.christmas8510

Sorry but I am curious did you finally found a way to randomly sample one snp per ID (contig) from a vcf file?

Thanks

ADD REPLYlink written 3.7 years ago by panatheod0
1
gravatar for Pablo
6.6 years ago by
Pablo1.9k
Canada
Pablo1.9k wrote:

"SnpSift filter" is quite powerful

ADD COMMENTlink modified 16 months ago by Ram32k • written 6.6 years ago by Pablo1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1730 users visited in the last hour
_