Bedtools give different output
1
0
Entering edit mode
14 days ago

Hello,

I have converted a gene annotation file using bedtools and used another bed file containing regions under selective sweeps. My goal is the find how many genes are inside the interval of the selective sweep regions.

I have a converted gene annotation file called gene_annot.bed and another file containing intervals in each scaffold called sweep_interval.bed. I first sorted both the annotation and the selective sweep interval files in bedtools and then used the following code to find how many genes are overlapping with the intervals.

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa > results.txt


Using this code, I got 915 genes. But if I used the following code:

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -u -wa > results_1.txt


Then I got 905 genes.

Can you please tell me if the code I am using especially -a and -b is right and if should I use the -u option or not?

Thank you.

bedtools annotation genome SNP bash • 342 views
1
Entering edit mode
14 days ago
barslmn ★ 1.3k

Using this code, I got 915 genes. But if I used the following code:

Then I got 905 genes.

How are you counting the genes?

My goal is the find how many genes are inside the interval of the selective sweep regions.

If your gene_annot.bed contains the unique gene intervals, and you just want to count the genes use the -u which returns unique overlapping regions on the -a file.

0
Entering edit mode

I am counting with the following command:

bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa | sort | uniq -c > results_new.txt


So, each line in the output text file is a different gene. Am I doing something wrong?

0
Entering edit mode
bedtools intersect -a gene_annot.bed -b sweep_interval.bed -wa | sort | uniq -c > results_new.txt


This command would result something similar to running bedtools with u. But these commands just write output the file. How are you getting the numbers?

Also, If you like to see what is different between these files diff command would help.

0
Entering edit mode

So, I just count the lines of the output file. Because I see every line is a unique gene ID. Isn't it the approach? or there is any other method?

1
Entering edit mode

There will be lots of uncertainties. No one can say you're doing it wrong or not without seeing the data. If you're unsure about it or have questions, create small positive and negative control data for yourself and see the output. Compare the differences and try to produce examples to reproduce these differences. Check if you're not seeing what you're not supposed to see, and you're seeing what you supposed to see.

0
Entering edit mode

Thank you. I will do that.