vcftools intersect vs bedtools intersect
1
0
Entering edit mode
3.0 years ago
gpen • 0

So I am dealing with a merged vcf file containing samples sequenced by both a panel as well as exome sequencing. I have tried both vcftools and bedtools for obtaining a subset of SNPs based on the regions described in a bed file. Both worked as far as I can tell but vcftools only included half the number of SNPs that bedtools did. The number of SNPs in the bedtools subset was much closer to what I was expecting. Here are the methods I ran for both bedtools and vcftools:

vcftools --gzvcf merged.vcf.gz --bed panel.bed --out subset --recode --keep-INFO-all

bedtools intersect -a merged.vcf.gz -b panel.bed |  bgzip > subset.vcf.gz

What I am hoping someone can tell me is the difference between the intersecting strategies performed by bedtools and vcftools and if there is an alternative for getting vcf subsets from a bed file that would be better than these options. Thank you!

bedtools vcf vcftools • 2.3k views
ADD COMMENT
1
Entering edit mode
3.0 years ago
bedops --element-of 1 <(gunzip -c merged.vcf.gz | vcf2bed -) <(sort-bed panel.bed) > subset.bed
ADD COMMENT
0
Entering edit mode

Hello Alex,

Bedops looks like a great tool thank you for the suggestion. Why do you opt for the --element-of arg you shared over --intersect? Does this have something to do with converting to bed and sorting first?

ADD REPLY
0
Entering edit mode

--intersect means something different in bedops than it does in other tools. With --element-of, this is a set membership test: does this interval in set A overlap another interval in set B by so-and-so bases. The --intersect operation is different in that it calculates a new set of genomic intervals where there are overlaps, i.e. where intervals actually intersect.

The following links: https://bedops.readthedocs.io/en/latest/content/reference/set-operations/bedops.html#element-of-e-element-of and https://bedops.readthedocs.io/en/latest/content/reference/set-operations/bedops.html#intersect-i-intersect show a graphical depiction of the differences.

ADD REPLY

Login before adding your answer.

Traffic: 2724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6