I have the following BAM file and a annotation file. The BAM is a paired end alignment. What I want to do is to calculate reads (as fragment) count in I can do it with the following three methods.
Below is the code
#!/bin/bash INPUT_BAM=my_paired_end_alignment.bam ANNNOT_BED=peak_annot.bed ANNNOT_GTF=peak_annot.gtf # Temporary file ALIGNMENT_BED=my_paired_end_alignment.bed #----------- # Method 1 #----------- FINAL_COUNT_METHOD1=output_method1.txt # convert BAM to BED bedtools bamtobed -bedpe -i $INPUT_BAM | cut -f1,2,6,7,8,9 > $ALIGNMENT_BED # Count fragments from BAM on ANNOT_BED bedtools intersect -a $ANNOT_BED -b $ALIGNMENT_BED -c > $FINAL_COUNT_METHOD1 #----------- # Method 2 #----------- FINAL_COUNT_METHOD2=output_method2.txt bedtools multicov -bams $INPUT_BAM -bed $ANNOT_BED > $FINAL_COUNT_METHOD2 #----------- # Method 3 #----------- FINAL_COUNT_METHOD3=output_method3.txt featureCounts -a $ANNOT_GTF -o $FINAL_COUNT_METHOD3 $INPUT_BAM
The result given by Method1 is so different from Method2 (totally uncorrelated), whereas Method2 is highly correlated with Method3.
My question Is Method1 the right way to count feature? If not why? Should we use Method2 or Method3 instead?
Speed is less of a concern at the moment, but rather the correctness of Method1. I need to be convinced if Method1 is correct or not.