Question: STAR mapping results
0
gravatar for mxlsherry1992
14 months ago by
mxlsherry199230 wrote:

Hi,

I used STAR to map my RNA seq data to the genome, here is the output file for the mapping rate, but I had a hard time to understand it..

Started job on |    Aug 09 16:35:19
                             Started mapping on |   Aug 09 16:35:48
                                    Finished on |   Aug 09 17:28:18
       Mapping speed, Million of reads per hour |   21.41

                          Number of input reads |   18734806
                      Average input read length |   249
                                    UNIQUE READS:
                   Uniquely mapped reads number |   10373363
                        Uniquely mapped reads % |   55.37%
                          Average mapped length |   242.88
                       Number of splices: Total |   8480091
            Number of splices: Annotated (sjdb) |   7969501
                       Number of splices: GT/AG |   8357360
                       Number of splices: GC/AG |   77588
                       Number of splices: AT/AC |   6734
               Number of splices: Non-canonical |   38409
                      Mismatch rate per base, % |   0.28%
                         Deletion rate per base |   0.03%
                        Deletion average length |   2.76
                        Insertion rate per base |   0.02%
                       Insertion average length |   2.49
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   1489776
             % of reads mapped to multiple loci |   7.95%
        Number of reads mapped to too many loci |   6179
             % of reads mapped to too many loci |   0.03%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |   0
       % of reads unmapped: too many mismatches |   0.00%
            Number of reads unmapped: too short |   6856355
                 % of reads unmapped: too short |   36.60%
                Number of reads unmapped: other |   9133
                     % of reads unmapped: other |   0.05%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

I just want to know if the mapping rate is

Uniquely mapped reads % (55.37%)

? Because when I used Hisat2, the mapping rate is the add for several things....So in here, if I use STAR..if I need to add some of the number?

rna-seq • 1.4k views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 14 months ago by mxlsherry199230
1

Depends on what you're looking for.

If you are interested in only the uniquely mapped reads (== good for most use cases, eg. expression analysis) then the number is what it is (rather on the low end judging with the info we have).

If, on the other hand, you want an idea of how many of the reads mapped in total you will need to add the uniquely with the multi-mapped ones to come to the final number.

Different aligners will come up with different number for the amount of aligned reads, but they should all be in somewhat the same range.

ADD REPLYlink written 14 months ago by lieven.sterck9.0k

Got it !! thank you!!! I will use the add of Uniquely mapped reads, % of reads mapped to multiple loci , % of reads mapped to too many loci

ADD REPLYlink written 14 months ago by mxlsherry199230

Seeing this high amount of unmapped reads, are you expecting this? You can get this high amounts of ungapped reads, e.g. if you forgot to trim the reads or when the paired end read files are not sorted right, in case this is just normal sequencing data of for examples human cells.

ADD REPLYlink written 14 months ago by caggtaagtat1.3k

Hi, thanks for reply, I already trimmed it. I thought that STAR will have relatively low mapped rate compared to Hisat2, stringtie...?

ADD REPLYlink written 14 months ago by mxlsherry199230

I never worked with Hisat2, but I would guess they should be around the same. Out of curiosity, did you check for rRNA content in the samples with for example the tool sortMeRNA? Analysing relatively old data, I came across rRNA contents of around 20-40% of total reads after mRNA enrichment. So that's theoretically possible.

ADD REPLYlink written 14 months ago by caggtaagtat1.3k

Why sort of trimming did you perform? Don't use hard cutoffs it will alter downstream analysis: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0956-2#Fig5

IMO, you should not do any trimming on RNAseq data, STAR will handle low quality bases. You can do very soft trimming the ends. I think this is your problem, you've trimmed too much so STAR has ignored those reads. You could alter the parameters in STAR to accept small reads but I Think trimming is the issue.

ADD REPLYlink written 13 months ago by Mark800

hi here is the script I used, if that is the error...

java -jar /tools/trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 1 -phred33 /home/Chan9-1_R1_001.fastq /home/Chan9-1_R2_001.fastq /home/clean_data/Chan9-1_R1_left_paired_trimmed.fq /home/Chan9-1_R1_left_unpaired_trimmed.fq /home/Chan9-1_R2_right_paired_trimmed.fq /home/Chan9-1_R2_right_unpaired_trimmed.fq ILLUMINACLIP:/tools/trimmomatic-0.36/adapters/TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:25 MINLEN:36
ADD REPLYlink written 12 months ago by mxlsherry199230

This high amount of "too short" reads could also be the result, when you have paired end reads which are not sorted correctly.

ADD REPLYlink written 13 months ago by caggtaagtat1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2184 users visited in the last hour