bedtools intersect mistakes
2
0
Entering edit mode
6.4 years ago
schelarina ▴ 30

Hello,

I am using the following command

bedtools intersect -wb -a file1.bed -b file2.gff3 > output.txt

In the output I have more entries that are not even present in the file1.bed!

I have tried with sorting and also changing the extension of the file2gff3 to bed but again the same output..

What is the problem?

Is there another tool i can use to do the same? or  awk ?

Thank you

bedtools • 9.1k views
2
Entering edit mode
6.4 years ago
bedtools intersect -wb -a file1.bed -b file2.gff3 > output.txt

Will write out all instances of B that overlaps with A

If you want to return all unique B that overlap with A it's this

bedtools intersect -wb -a file1.bed -b file2.gff3 | sort | uniq > output.txt

If you are interested in A and want to find all unique overlap to B it's this

bedtools intersect -wa -a file1.bed -b file2.gff3 | sort |uniq > output.txt

If you want to find the base pair overlap in A with each element in B

bedtools intersect -wao -a file1.bed -b file2.gff3 | sort | uniq  > output.txt
0
Entering edit mode
6.4 years ago

That doesn't sound like a mistake, but rather that you're getting the correct output. You'll get >=1 line of output for every line in file1.bed, since if a line overlaps multiple entries in file2.gff3 then you'll get each of those. Since you're intersecting with a gff file, it'd be surprising not to see this sort of behaviour and all tools will and should act like this.

Perhaps you just want to intersect with unique exons.