Vcftools: Filtering By Multiple Regions (--Positions Flag?)
3
2
Entering edit mode
8.5 years ago
Matt W ▴ 250

Is there a way to filter multiple positions from my VCF files? I am trying to use vcftools which basically gives me two different options.

1. --chr $(chrom) --from-bp$(start) --to-bp \$(stop)

The problem with this approach is I need multiple regions. So do I just reuse these flags multiple times? Specifically, there are 2192 regions I would like to extract.

2. --positions pos.txt

According to the docs, the input file requires a "chromosome and position", but I need multiple regions. This would work if I could specify regions.

Am I misinterpreting how to use these flags? Or is there an easier way to extract multiple regions from VCF files?

Thanks!

vcftools snps filtering • 12k views
4
Entering edit mode
8.5 years ago

0
Entering edit mode

I don't actually have a second input file. I only have a list of regions that I would like to extract. Does bedtools support an input that isn't BED/GFF/VCF?

1
Entering edit mode

" I only have a list of regions": means you have a BED (chrom/chromStart/chromEnd) https://genome.ucsc.edu/FAQ/FAQformat.html#format1

0
Entering edit mode

Ah, silly question. Thanks for the reply. I should have read the docs before making an assumption about the format. Thanks!

0
Entering edit mode

Code works for me :) bedtools v2.25.0

bedtools intersect -a myfile.vcf.gz -b myref.bed -header > output.vcf

0
Entering edit mode

Thank you very much, it works for me.

1
Entering edit mode
8.5 years ago
Erik Garrison ★ 2.3k

vcfintersect in vcflib will do this.

vcfintersect -b regions.bed variants.vcf


You can also use another VCF file, but you'll need a reference (it checks the haplotypes to be sure that alleles are the same even if they are aligned differently).

vcfintersect -f ref.fa -i known.vcf new.vcf >results.vcf


Note that intersecting variants will remove alleles which don't overlap even if they are at the same position as variants which do. The records are all adjusted to reflect the fact that an allele has been removed to maintain semantic consistency in the file. Specifically, all Number=A and Number=G fields in INFO and in the sample fields are adjusted.

0
Entering edit mode
8.5 years ago

You could use the --bed option in vcftools (or use bedtools as Pierre suggests).