Question

Can or should we perform feature counting with `bedtools intersect -c`?

2

Entering edit mode

6.6 years ago

gundalav ▴ 380

I have the following BAM file and a annotation file. The BAM is a paired end alignment. What I want to do is to calculate reads (as fragment) count in I can do it with the following three methods.

Method1: bedtools intersect
Method2: bedtools multicov
Method3: featureCounts.

Below is the code

#!/bin/bash
INPUT_BAM=my_paired_end_alignment.bam
ANNNOT_BED=peak_annot.bed
ANNNOT_GTF=peak_annot.gtf

# Temporary file
ALIGNMENT_BED=my_paired_end_alignment.bed

#-----------
# Method 1
#-----------
FINAL_COUNT_METHOD1=output_method1.txt
# convert BAM to BED
bedtools bamtobed -bedpe -i $INPUT_BAM | cut -f1,2,6,7,8,9 > $ALIGNMENT_BED
# Count fragments from BAM on ANNOT_BED
bedtools intersect -a $ANNOT_BED -b $ALIGNMENT_BED -c > $FINAL_COUNT_METHOD1

#-----------
# Method 2
#-----------
FINAL_COUNT_METHOD2=output_method2.txt
bedtools multicov -bams $INPUT_BAM -bed $ANNOT_BED > $FINAL_COUNT_METHOD2

#-----------
# Method 3
#-----------
FINAL_COUNT_METHOD3=output_method3.txt
featureCounts  -a $ANNOT_GTF -o $FINAL_COUNT_METHOD3 $INPUT_BAM

The result given by Method1 is so different from Method2 (totally uncorrelated), whereas Method2 is highly correlated with Method3.

My question Is Method1 the right way to count feature? If not why? Should we use Method2 or Method3 instead?

Speed is less of a concern at the moment, but rather the correctness of Method1. I need to be convinced if Method1 is correct or not.

bedtools rna-seq featureCounts • 3.4k views

ADD COMMENT • link 6.6 years ago by gundalav ▴ 380

score 3 · Answer 1 · 2017-09-01

Method1: Not advisable, as it simply takes too much time to make a BED and consumes unnecessary space as the BED is not binary. Still, if you want to do that, pipe the bedpe output directly into:

cut -f1,2,6,7,8,9

to get a proper representation of your fragment that adequately covers the interval between the reads (the ISIZE basically). The cut will make a normal bed with the start of the fwd and the end of the rev read. Still as I said, not really efficient for your purpose.

Method 2: I think multicov still cannot handle paired-end reads and counts both mates independently.

Method 3: Method of choice, fast and efficient with many parameters to finetune for your purpose, but use the -p option for paired-end!