Question: Can or should we perform feature counting with `bedtools intersect -c`?
2
gravatar for gundalav
2.1 years ago by
gundalav290
La La Land
gundalav290 wrote:

I have the following BAM file and a annotation file. The BAM is a paired end alignment. What I want to do is to calculate reads (as fragment) count in I can do it with the following three methods.

Below is the code

#!/bin/bash
INPUT_BAM=my_paired_end_alignment.bam
ANNNOT_BED=peak_annot.bed
ANNNOT_GTF=peak_annot.gtf

# Temporary file
ALIGNMENT_BED=my_paired_end_alignment.bed

#-----------
# Method 1
#-----------
FINAL_COUNT_METHOD1=output_method1.txt
# convert BAM to BED
bedtools bamtobed -bedpe -i $INPUT_BAM | cut -f1,2,6,7,8,9 > $ALIGNMENT_BED
# Count fragments from BAM on ANNOT_BED
bedtools intersect -a $ANNOT_BED -b $ALIGNMENT_BED -c > $FINAL_COUNT_METHOD1

#-----------
# Method 2
#-----------
FINAL_COUNT_METHOD2=output_method2.txt
bedtools multicov -bams $INPUT_BAM -bed $ANNOT_BED > $FINAL_COUNT_METHOD2

#-----------
# Method 3
#-----------
FINAL_COUNT_METHOD3=output_method3.txt
featureCounts  -a $ANNOT_GTF -o $FINAL_COUNT_METHOD3 $INPUT_BAM

The result given by Method1 is so different from Method2 (totally uncorrelated), whereas Method2 is highly correlated with Method3.

My question Is Method1 the right way to count feature? If not why? Should we use Method2 or Method3 instead?

Speed is less of a concern at the moment, but rather the correctness of Method1. I need to be convinced if Method1 is correct or not.

rna-seq featurecounts bedtools • 1.3k views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by gundalav290
2
gravatar for ATpoint
2.1 years ago by
ATpoint24k
Germany
ATpoint24k wrote:

Method1: Not advisable, as it simply takes too much time to make a BED and consumes unnecessary space as the BED is not binary. Still, if you want to do that, pipe the bedpe output directly into:

cut -f1,2,6,7,8,9

to get a proper representation of your fragment that adequately covers the interval between the reads (the ISIZE basically). The cut will make a normal bed with the start of the fwd and the end of the rev read. Still as I said, not really efficient for your purpose.

Method 2: I think multicov still cannot handle paired-end reads and counts both mates independently.

Method 3: Method of choice, fast and efficient with many parameters to finetune for your purpose, but use the -p option for paired-end!

ADD COMMENTlink written 2.1 years ago by ATpoint24k

@ATPoint Thanks for your reply. Speed is less of a concern at the moment, but rather the correctness of Method1. I tried using cut -f1,2,6,7,8,9. But the Method1 is totally uncorrelated (different) from Method2 and Method3. I need to be convinced why Method1 is incorrect. Your enlightenment will be greatly appreciated.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by gundalav290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1727 users visited in the last hour