Question: Is there a quick method to extract regularly-spaced features/SNPs from a VCF file?
gravatar for mmats010
2.7 years ago by
mmats01060 wrote:

I am attempting to build linkage map using a list of variants from a sexual population we have whole genome sequenced. Marker density is the least of our problems, so I would like to reduce the ~400,000 SNPs I have high confidence in to ~2,000. However, I would like them to be regularly spaced.

In my head, this means: Select SNP, move 100kb down the contig, select the next SNP after the 100kb interval, move another 100kb, select another SNP, repeat. We aren't quite sure of the recombination rate, and some of my contigs are shorter than 100kb (and I'd like at least 1, maybe 2, markers from them).

I know I could tell GATK to give me a random subset of my marker population, but I'd like to do something more methodical. The window based clustering criteria in VariantFiltration also don't seem very useful, since it is a sliding window and not a binned window. Does anyone know of a function of a common toolset which can perform these functions? I'm not proficient in perl or python, so I'm unsure how to write my own script.

I have a .intervals file and a multisample VCF file, as well as a .txt table of everything in the VCF file and I intend to load the data into JoinMap to perform the actual mapping.



sequencing snp mapping gatk vcf • 1.1k views
ADD COMMENTlink modified 2.7 years ago by christopher medway440 • written 2.7 years ago by mmats01060
gravatar for christopher medway
2.7 years ago by
Cardiff, UK
christopher medway440 wrote:

I'm not sure I fully understand why you want to arbitrarily select SNPs at a given distance. But I think you may be looking for the "--thin" flag in VCFtools. Also take a look at this. I have never used it, but I hope it helps.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by christopher medway440

Thanks, this did the trick. I mostly want to trim down because JoinMap 4.1 can't handle a monster dataset like mine (JoinMap 5.0 will, however). So I want to give it at least a manageable dataset for it to trim down internally. I also want my markers to be at least somewhat evenly space.

ADD REPLYlink written 2.7 years ago by mmats01060
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 985 users visited in the last hour