Why my reads mapping rate extremely low
Entering edit mode
12 months ago
LeeLee ▴ 20

I am using a script to analyze a large amount of ribo-seq data, which comes from different studies. Because from different research, I use trim_galore to remove the adaptor. Use bowtie2 to remove rRNA and use STAR mapping. After mapping, I found that the mapping rate of some of the data is normal, while some are extremely low, such as the following one.

Number of reads unmapped: too many mismatches  | 9953452
      % of reads unmapped: too many mismatches | 6.06%
Number of reads unmapped: too short            | 153984837
      % of reads unmapped: too short           | 93.77%
Number of reads unmapped: other                | 177470
      % of reads unmapped: other               | 0.11%

First I thought of adding parameters

--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 0

, but it still didn't solve my problem.

          Time    Speed        Read     Read   Mapped   Mapped   Mapped   Mapped Unmapped Unmapped Unmapped Unmapped
                   M/hr      number   length   unique   length   MMrate    multi   multi+       MM    short    other
Oct 09 13:54:14   224.7     3931595       54     3.3%     34.2     1.1%     0.0%    96.7%     0.0%     0.0%     0.0% 
Oct 09 13:54:14   474.6    16216411       54     3.3%     34.2     1.0%     0.0%    96.7%     0.0%     0.0%    0.0%

The mapping rate after running the data with hisat2 default parameters is also similar.

Time loading forward index: 00:00:02 Time loading reference: 00:00:00
Multiseed full-index search: 00:00:04 1000000 reads; of these:  
1000000 (100.00%) were unpaired; of these:
    999572 (99.96%) aligned 0 times
     169 (0.02%) aligned exactly 1 time
     259 (0.03%) aligned >1 times
0.04% overall alignment rate

I checked the data again and confirmed that I did not mistake the species from which the data came. Why does this happen and how can I solve this problem?

STAR Bioinformatics hista2 • 879 views
Entering edit mode

Looks like many reads are "too short". Have you checked read length distribution with fastqc or something else ?

% of reads unmapped: too short | 93.77%
Entering edit mode

STAR's "too short" doesn't usually mean literally too short. It just means the reads didn't map. I'd pull out the most common unmapped reads and see what they are.

Entering edit mode

Yes, I checked one of the files SRR9971635 (fastq.gz file after rRNA removal) with fastqc, and found that the length of almost all reads is 40-60bp, which is too long for ribo-seq (This is another point that confuses me).

Entering edit mode

select one high quality sequence that you can BLAST or BLAT but doesn't align with STAR and copy-paste it here


Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6