I've been dealing with comparison between 9 bed files using bedtools multiinter command in order to find all the genomic regions common to all of them and those specific to each of them (or to 2, 3, 4,... of them and of the putative combination that is possible to imagine with them). Unfortunately, as regards to the output file I have got, some overlap are missing and by this way, I have an overestimation of the part of the genomic regions specific to a specific bed files and in consequences an underestimation of the one share by some or all of them.
I run this command line:
bedtools multiiter -i IND1.bed IND2.bed IND3.bed IND4.bed IND5.bed IND6.bed IND7.bed IND8.bed IND9.bed -g XX.genome -header > Sharedregions.out
And for instance, I have got this kind of output data:
chrom start end num list
...
Contigs_21043 6 9 3 1,4,5 1 0 0 1 1 0 0 0 0
Contigs_21043 9 10 4 1,4,5,8 1 0 0 1 1 0 0 1 0
Contigs_21043 10 24 5 1,3,4,5,8 1 0 1 1 1 0 0 1 0
Contigs_21043 24 43 6 1,3,4,5,6,8 1 0 1 1 1 1 0 1 0
Contigs_21043 43 110 7 1,3,4,5,6,7,8 1 0 1 1 1 1 1 1 0
Contigs_21043 110 134 6 1,3,4,5,6,7 1 0 1 1 1 1 1 0 0
Contigs_21043 134 188 5 1,4,5,6,7 1 0 0 1 1 1 1 0 0
Contigs_21043 188 193 4 1,4,5,7 1 0 0 1 1 0 1 0 0
Contigs_21043 193 203 3 1,5,7 1 0 0 0 1 0 1 0 0
Contigs_21043 0 205 1 2 0 1 0 0 0 0 0 0 0
Contigs_21043 1 3 1 4 0 0 0 1 0 0 0 0 0
Contigs_21043 3 6 2 4,5 0 0 0 1 1 0 0 0 0
Contigs_21043 203 205 2 5,7 0 0 0 0 1 0 1 0 0
Contigs_21043 24 198 1 9 0 0 0 0 0 0 0 0 1
...
As you can see, for Contigs_21043
, the overlaps of the bed.files of the second and ninth individuals with the other bed.files are missing. In consequence, I cannot "see" the common regions shared by this individuals with the rest of the studied individuals.
Do you any idea of the reason of such results? And if so, what should I do in order to get the information that I want (with bedtools multiinter or other tools)?
Thanks for the suggestions.
So, I did what you have proposed by sorting a,d merging first the bed files before using multiinter.
It is getting better but there is still some overlapped regions missed, especially for one bed file (the common regions of the other bed files between each other seems to be detect by the program).
Do you have any idea what can explain this?
Can you post few lines of bed file which are not getting detected by program?
Actually, by taking a closest look on this bed file, I suspect that an error has occurred during the sorting step for this bed file. I will sort it again and run again multiinter and will let you know...