Question

featureCounts fails to reorder BAM and duplicates read counts

0

Entering edit mode

7.0 years ago

samuel.marsh • 0

Hi,

Sorry for long post but I'm new to all of this. I have been (slowly) figuring out how to run preprocessing for RNA-seq using pipeline of fastq > STAR > featureCounts before proceeding to analysis. I've recently noticed a seemingly random error keeps happening using featureCounts from Subread 1.60 running on my university's HPC cluster.

I have have paired end reads processed in STAR into single BAM file which I then provide to featureCounts. I have been keeping each lane of the flow cell separate until after featureCounts before combining them. The output read counts for each lane are usually (as expected) very close in value to each other. But I started to notice occasionally one lane would output a value ~2x the other lanes in terms of aligned reads. When I would rerun the samples I found that sometimes it gave output ~2x the other lanes and other times in gave output ~equal to the other lanes.

To try and find solution to the error I wrote script which literally contained identical code copied and pasted 10 times to simply perform featureCounts on the same single BAM file. The code I used is below

featureCounts -p -T 6 \ 
   -a /home/Genomes/gencode_m17_genome/gencode_vM17_grcm38_p6.gtf \
   -o /home/exp1_20180705/test_featureCount_errors/sample27_lane3/sample27_lane3_error_testing1.txt \
   LIB03_S27_L003_Aligned.sortedByCoord.out.bam

Two of the times the output from featureCounts counted 3,875,038 reads Eight of the times the output from featureCounts counted 7,261,716 (See example output text at bottom of this post).

The only difference appears to be that when the counts end up nearly duplicated that featureCounts has failed to reorder the BAM file. But this seems to happen randomly and to random BAM files when I run featureCounts on multiple BAM files at the same time. Even when I repeat the identical code for multiple BAM files new files randomly show up duplicated.

Does anyone know why this is happening and how I can prevent it? It gets really annoying to sort through the outputs to find which files this occurred to and rerun them until the issue clears up.

Thanks!!!

/

/================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /home/Genomes/gencode_m17_genome/genc ... ||
||    Features : 801623                                                       ||
||    Meta-features : 53801                                                   ||
||    Chromosomes/contigs : 45                                                ||
||                                                                            ||
|| Process BAM file LIB03_S27_L003_Aligned.sortedByCoord. ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 17288816                                              ||
||    Successfully assigned fragments : 3875038 (22.4%)                       ||
||    Running time : 1.15 minutes                                             ||
||                                                                            ||
||                         Read assignment finished.                          ||
||                                                                            ||
|| Summary of counting results can be found in file "/home/exp1_  ||
|| 20180705/test_featureCount_errors/sample27_lane3/sample27_lane3_err  ||
|| or_testing1.txt.summary"                                                   ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

***

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file /home/Genomes/gencode_m17_genome/genc ... ||
||    Features : 801623                                                       ||
||    Meta-features : 53801                                                   ||
||    Chromosomes/contigs : 45                                                ||
||                                                                            ||
|| Process BAM file LIB03_S27_L003_Aligned.sortedByCoord. ... ||
||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||    Total fragments : 34577632                                              ||
||    Successfully assigned fragments : 7261716 (21.0%)                       ||
||    Running time : 1.12 minutes                                             ||
||                                                                            ||
||                         Read assignment finished.                          ||
||                                                                            ||
|| Summary of counting results can be found in file "/home/exp1_  ||
|| 20180705/test_featureCount_errors/sample27_lane3/sample27_lane3_err  ||
|| or_testing2.txt.summary"                                                   ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

RNA-Seq featureCounts Subread • 2.9k views

ADD COMMENT • link updated 7.0 years ago by Devon Ryan 105k • written 7.0 years ago by samuel.marsh • 0

score 0 · Answer 1 · 2018-07-06

When featureCounts fails to reorder things it's likely that it's treating your PE reads as SE. What would be interesting is to catch the return code ($?) from those runs and see if it returned a reasonable (not 0) value.

Of course this then leads to the question of WHY featureCounts can't reorder the file. When it does this it write a bunch of temp files and I expect that it runs out of space. See if you can tweak where it does the sorting such that each run is sorted in a different directory. Perhaps that'll solve things.

BTW, STAR can also produce counts directly (--quantMode I think).