Difference between read counts STAR vs featureCounts
1
1
Entering edit mode
4.6 years ago
sg197 ▴ 40

I aligned some paired-end RNA-seq data using STAR and then used featureCounts on the output .bam to assign them to genes. However I'm really confused as the output of featureCounts shows that I have more fragments (read-pairs) than the number of total input reads (I'm assuming also read-pairs/fragments) in the STAR output.

STAR output:

                      Number of input reads |   45661688
                  Average input read length |   199
                                UNIQUE READS:
               Uniquely mapped reads number |   41983291
                    Uniquely mapped reads % |   91.94%

featureCounts output:

||    Paired-end reads are included.                                          ||
||    Assign fragments (read pairs) to features...                            ||
||                                                                            ||
||    WARNING: reads from the same pair were found not adjacent to each       ||
||             other in the input (due to read sorting by location or         ||
||             reporting of multi-mapping read pairs).                        ||
||                                                                            ||
||    Read re-ordering is performed.                                          ||
||                                                                            ||
||    Total fragments : 49756690                                              ||
||    Successfully assigned fragments : 36826889 (74.0%)                      ||

45661688 vs 49756690... where have these extra ~4 million reads come from? Also, aren't multimapped reads excluded from the .bam STAR output anyway so really it is 41983291 vs 49756690? Anyone any ideas?? Thanks!

STAR featureCounts multimapped RNA-Seq • 7.3k views
ADD COMMENT
1
Entering edit mode

By default maximum number of multiple alignments allowed for a read is set to 10 in STAR. If a read exceeds this number there is no alignment output. If you did not change the value of --outFilterMultimapNmax then you should have reads that map 10 times or less in your data.

ADD REPLY
0
Entering edit mode

Ok if I understand right then featureCounts is counting multi mapped fragments each as unique fragments (with different genomic co-ords) even though they are the same original read-pair? Would you recommend leaving the STAR default as 10 for aligning to the genome? On a side note I've also been reading a bit about mmquant, would you recommend using that instead of feature counts for assigning features? I have one dataset (single-end) that had 55% assigned reads to features with featureCount as many were lost due to multi-mapping. I'm new to all this so for me 55% seems very low?...

ADD REPLY
2
Entering edit mode
4.6 years ago
h.mon 35k

You are mixing two things. STAR's "Log.Final.out" reports mapping statistics in reference to the number of input reads in the fastq file. So Number of input reads | 45661688 refers to the total number of reads from the fastq file. Down the log file, STAR outputs several statistics which account for every input read:

  • Uniquely mapped reads number | 41983291 (which you showed)
  • Number of reads mapped to multiple loci
  • Number of reads mapped to too many loci
  • Number of reads unmapped: too many mismatches
  • Number of reads unmapped: too short
  • Number of reads unmapped: other
  • Number of chimeric reads

Summing all these, you should recover the total number of input reads, that is, the number of reads from the fastq file.

featureCount log reports statistics in reference to the number of mapped reads, which can be different from the number of input reads from the fastq file for two reasons:

  • unmapped reads may be left out of the bam file
  • multi-mapping reads will increase the number of "fragments" reported by featureCounts

To get the total number of fragments, featureCounts counts multi-mapped reads. However, with default settings, featureCounts doesn't assign those counts to any feature - that is, multi-mappers results in zero counts to the features they map.

Regarding the questions from your comment:

Would you recommend leaving the STAR default as 10 for aligning to the genome?

If you are using featureCounts for read summarization, this setting doesn't make any difference, as featureCounts will assign zero counts for multi-mappers. So, yeas, leave it at the default.

On a side note I've also been reading a bit about mmquant, would you recommend using that instead of feature counts for assigning features?

No, I would stick to using featureCounts, or use a method which accounts for multi-mappers with a proper statistical model, such as RSEM, Salmon or kallisto.

ADD COMMENT
0
Entering edit mode

Note that Salmon and Kallisto, being pseudo mappers, require you to start over with fastqs and trasncriptomes as import, RSEM can be applied to the Aligned.toTranscriptome.out.bam file that STAR might have already outputted for you.

ADD REPLY

Login before adding your answer.

Traffic: 1482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6