vcftools intersect vs bedtools intersect
1
0
Entering edit mode
4 weeks ago
gpen • 0

So I am dealing with a merged vcf file containing samples sequenced by both a panel as well as exome sequencing. I have tried both vcftools and bedtools for obtaining a subset of SNPs based on the regions described in a bed file. Both worked as far as I can tell but vcftools only included half the number of SNPs that bedtools did. The number of SNPs in the bedtools subset was much closer to what I was expecting. Here are the methods I ran for both bedtools and vcftools:

vcftools --gzvcf merged.vcf.gz --bed panel.bed --out subset --recode --keep-INFO-all

bedtools intersect -a merged.vcf.gz -b panel.bed |  bgzip > subset.vcf.gz


What I am hoping someone can tell me is the difference between the intersecting strategies performed by bedtools and vcftools and if there is an alternative for getting vcf subsets from a bed file that would be better than these options. Thank you!

bedtools vcf vcftools • 345 views
1
Entering edit mode
4 weeks ago
bedops --element-of 1 <(gunzip -c merged.vcf.gz | vcf2bed -) <(sort-bed panel.bed) > subset.bed

0
Entering edit mode

Hello Alex,

Bedops looks like a great tool thank you for the suggestion. Why do you opt for the --element-of arg you shared over --intersect? Does this have something to do with converting to bed and sorting first?

0
Entering edit mode

--intersect means something different in bedops than it does in other tools. With --element-of, this is a set membership test: does this interval in set A overlap another interval in set B by so-and-so bases. The --intersect operation is different in that it calculates a new set of genomic intervals where there are overlaps, i.e. where intervals actually intersect.