comparing mapping statistics on s. pombe alignment (Star vs. tophat)
1
0
Entering edit mode
4.6 years ago
dho322 • 0

Hi all,

I am hoping to get some insight into what is happening here or any suggestions.

I am aligning total RNA-seq, single-end data to S. pombe with STAR and Tophat but getting two very different uniquely mapping statistics:

STAR:

                      Number of input reads |   20416529
Average input read length |   47
Uniquely mapped reads number |   2622002
Uniquely mapped reads % |   12.84%
Average mapped length |   48.83


Tophat

    Reads:
Input     :  20416529
Mapped   :  19397908 (95.0% of input)


12.84 v 95% is a pretty big difference.

Any ideas?

RNA-Seq pombe star alignment tophat • 1.1k views
0
Entering edit mode

Can you please post the command lines you used?

0
Entering edit mode

Because this is total RNA-seq most of your data is likely rRNA reads, which would likely be multi-mapping hence not counted as uniquely mapped by STAR.

Please don't use TopHat for any current projects.

0
Entering edit mode

Hmm, this could be the case with rRNA contamination. Looking back the at library prep, there doesnt seem to be an rRNA depletion step.

0
Entering edit mode

If the prep was for total RNAseq then that is expected. If the prep was supposed to be for mRNAseq with ribo-depletion then ..

0
Entering edit mode
4.6 years ago

Besides the low number of uniquely mapping reads which is suspcious, you are confusing

"Uniquely mapped" and "Mapped".

Note the "uniquely" in Star output while tophat gives you the total number of mapped reads. Star does not give you the sum of all mapped reads in the Log.final.out, instead look for

                Uniquely mapped reads number  |     96821769
Uniquely mapped reads %  |     92.13%


and

      Number of reads mapped to multiple loci  |     6264908
% of reads mapped to multiple loci  |     5.96%


The sum of these is the total aligned number/percentage.

In addition, you could include those if you have a filter on the max. number of multimapping locations:

     Number of reads mapped to too many loci  |     62660
% of reads mapped to too many loci  |     0.06%

0
Entering edit mode

Yes.

 tophat -g 1


in running Tophat, which from my understanding should only give me reads that map to 1 loci in the resulting BAM file.

I do think that the number I am seeing in STAR reflects rRNA contamination which would map to multiple locations, but surprising that tophat is giving something else.

1
Entering edit mode

this likely does not explain the difference but do not confuse 'uniquely mapped' with 'report a single locus map' (== what you specify for tophat) as mentioned by @Michael Dondrup. Even with the -g 1 option of tophat you will still get reads that map to several loci but only one location will get reported, which is not the same as uniquely mapped!