Bedtools Multiiter to compare multiple bed files?
1
0
Entering edit mode
9.3 years ago
Sana63 • 0

I've been dealing with comparison between 9 bed files using bedtools multiinter command in order to find all the genomic regions common to all of them and those specific to each of them (or to 2, 3, 4,... of them and of the putative combination that is possible to imagine with them). Unfortunately, as regards to the output file I have got, some overlap are missing and by this way, I have an overestimation of the part of the genomic regions specific to a specific bed files and in consequences an underestimation of the one share by some or all of them.

I run this command line:

bedtools multiiter -i IND1.bed IND2.bed IND3.bed IND4.bed IND5.bed IND6.bed IND7.bed IND8.bed IND9.bed -g XX.genome -header > Sharedregions.out

And for instance, I have got this kind of output data:

chrom    start    end    num    list
...
Contigs_21043    6    9    3    1,4,5    1    0    0    1    1    0    0    0    0
Contigs_21043    9    10    4    1,4,5,8    1    0    0    1    1    0    0    1    0
Contigs_21043    10    24    5    1,3,4,5,8    1    0    1    1    1    0    0    1    0
Contigs_21043    24    43    6    1,3,4,5,6,8    1    0    1    1    1    1    0    1    0
Contigs_21043    43    110    7    1,3,4,5,6,7,8    1    0    1    1    1    1    1    1    0
Contigs_21043    110    134    6    1,3,4,5,6,7    1    0    1    1    1    1    1    0    0
Contigs_21043    134    188    5    1,4,5,6,7    1    0    0    1    1    1    1    0    0
Contigs_21043    188    193    4    1,4,5,7    1    0    0    1    1    0    1    0    0
Contigs_21043    193    203    3    1,5,7    1    0    0    0    1    0    1    0    0
Contigs_21043    0    205    1    2    0    1    0    0    0    0    0    0    0
Contigs_21043    1    3    1    4    0    0    0    1    0    0    0    0    0
Contigs_21043    3    6    2    4,5    0    0    0    1    1    0    0    0    0
Contigs_21043    203    205    2    5,7    0    0    0    0    1    0    1    0    0
Contigs_21043    24    198    1    9    0    0    0    0    0    0    0    0    1
...

As you can see, for Contigs_21043, the overlaps of the bed.files of the second and ninth individuals with the other bed.files are missing. In consequence, I cannot "see" the common regions shared by this individuals with the rest of the studied individuals.

Do you any idea of the reason of such results? And if so, what should I do in order to get the information that I want (with bedtools multiinter or other tools)?

alignment genome • 5.3k views
ADD COMMENT
0
Entering edit mode
9.3 years ago

First make sure you have sorted your bed file. Then merge the overlapping regions in bed file with bed tools merge. Then do multiinter.

Now multiinter out puts overlapped regions but it will creates bins for short overlapped regions. If you feel the output is not correct, load the bed files into genome browser and inspect regions of interest which gives you an exact idea of what multiinter is doing.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestions.

So, I did what you have proposed by sorting a,d merging first the bed files before using multiinter.

It is getting better but there is still some overlapped regions missed, especially for one bed file (the common regions of the other bed files between each other seems to be detect by the program).

Do you have any idea what can explain this?

ADD REPLY
0
Entering edit mode

Can you post few lines of bed file which are not getting detected by program?

ADD REPLY
0
Entering edit mode

Actually, by taking a closest look on this bed file, I suspect that an error has occurred during the sorting step for this bed file. I will sort it again and run again multiinter and will let you know...

ADD REPLY

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6