Hi,
Sorry for long post but I'm new to all of this. I have been (slowly) figuring out how to run preprocessing for RNA-seq using pipeline of fastq > STAR > featureCounts before proceeding to analysis. I've recently noticed a seemingly random error keeps happening using featureCounts from Subread 1.60 running on my university's HPC cluster.
I have have paired end reads processed in STAR into single BAM file which I then provide to featureCounts. I have been keeping each lane of the flow cell separate until after featureCounts before combining them. The output read counts for each lane are usually (as expected) very close in value to each other. But I started to notice occasionally one lane would output a value ~2x the other lanes in terms of aligned reads. When I would rerun the samples I found that sometimes it gave output ~2x the other lanes and other times in gave output ~equal to the other lanes.
To try and find solution to the error I wrote script which literally contained identical code copied and pasted 10 times to simply perform featureCounts on the same single BAM file. The code I used is below
featureCounts -p -T 6 \
-a /home/Genomes/gencode_m17_genome/gencode_vM17_grcm38_p6.gtf \
-o /home/exp1_20180705/test_featureCount_errors/sample27_lane3/sample27_lane3_error_testing1.txt \
LIB03_S27_L003_Aligned.sortedByCoord.out.bam
Two of the times the output from featureCounts counted 3,875,038 reads Eight of the times the output from featureCounts counted 7,261,716 (See example output text at bottom of this post).
The only difference appears to be that when the counts end up nearly duplicated that featureCounts has failed to reorder the BAM file. But this seems to happen randomly and to random BAM files when I run featureCounts on multiple BAM files at the same time. Even when I repeat the identical code for multiple BAM files new files randomly show up duplicated.
Does anyone know why this is happening and how I can prevent it? It gets really annoying to sort through the outputs to find which files this occurred to and rerun them until the issue clears up.
Thanks!!!
/
/================================= Running ==================================\\
|| ||
|| Load annotation file /home/Genomes/gencode_m17_genome/genc ... ||
|| Features : 801623 ||
|| Meta-features : 53801 ||
|| Chromosomes/contigs : 45 ||
|| ||
|| Process BAM file LIB03_S27_L003_Aligned.sortedByCoord. ... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| ||
|| WARNING: reads from the same pair were found not adjacent to each ||
|| other in the input (due to read sorting by location or ||
|| reporting of multi-mapping read pairs). ||
|| ||
|| Read re-ordering is performed. ||
|| ||
|| Total fragments : 17288816 ||
|| Successfully assigned fragments : 3875038 (22.4%) ||
|| Running time : 1.15 minutes ||
|| ||
|| Read assignment finished. ||
|| ||
|| Summary of counting results can be found in file "/home/exp1_ ||
|| 20180705/test_featureCount_errors/sample27_lane3/sample27_lane3_err ||
|| or_testing1.txt.summary" ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
***
//================================= Running ==================================\\
|| ||
|| Load annotation file /home/Genomes/gencode_m17_genome/genc ... ||
|| Features : 801623 ||
|| Meta-features : 53801 ||
|| Chromosomes/contigs : 45 ||
|| ||
|| Process BAM file LIB03_S27_L003_Aligned.sortedByCoord. ... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| Total fragments : 34577632 ||
|| Successfully assigned fragments : 7261716 (21.0%) ||
|| Running time : 1.12 minutes ||
|| ||
|| Read assignment finished. ||
|| ||
|| Summary of counting results can be found in file "/home/exp1_ ||
|| 20180705/test_featureCount_errors/sample27_lane3/sample27_lane3_err ||
|| or_testing2.txt.summary" ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//