I am trying to do an overlap analysis between 200 danish exomes (VCF courtsey: Zev) and 10 different gene regions.
I would like to know what percentage overlaps between my region of interest (in mygenes.bed total of 36
lines representing the region) and a VCF file (Danish_*.flt.vcf.gz).
I have tried this command and got result: intersectBed -a Danish1.flt.vcf.gz -b mygenes.bed > D1result.txt
Danish1.flt.vcf.gz: here mygenes.bed: here D1overlapped.txt: here
My assumption is that the output should have lines <= the total number of lines in the mygenes.bed file. But in many instances I am getting more than 36 lines as output. May be am missing something important or may be another tool / option in bedtools can do this task more efficiently. Please let me know your thoughts.
Thanks Sukhdeep for your point on header lines, that's helpful. When I used "intersectBed -a stdin -b mygenes.bed -wb" am getting 38 and with 'intersectBed -a stdin -b mygenes.bed -wb' am getting 12. My idea is to quantify the unique overlap between the mygenes.bed and *.vcfs so am now using -u option. My average overlap is around 34.286, I think this is very low for genes of interest, all of them are very well known genes.