Why my reads mapping rate extremely low
0
0
Entering edit mode
6 weeks ago
LeeLee • 0

I am using a script to analyze a large amount of ribo-seq data, which comes from different studies. Because from different research, I use trim_galore to remove the adaptor. Use bowtie2 to remove rRNA and use STAR mapping. After mapping, I found that the mapping rate of some of the data is normal, while some are extremely low, such as the following one.

UNMAPPED READS:
Number of reads unmapped: too many mismatches  | 9953452
% of reads unmapped: too many mismatches | 6.06%
Number of reads unmapped: too short            | 153984837
% of reads unmapped: too short           | 93.77%
Number of reads unmapped: other                | 177470
% of reads unmapped: other               | 0.11%


First I thought of adding parameters

--outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --outFilterMatchNmin 0


, but it still didn't solve my problem.

          Time    Speed        Read     Read   Mapped   Mapped   Mapped   Mapped Unmapped Unmapped Unmapped Unmapped
M/hr      number   length   unique   length   MMrate    multi   multi+       MM    short    other
Oct 09 13:54:14   224.7     3931595       54     3.3%     34.2     1.1%     0.0%    96.7%     0.0%     0.0%     0.0%
Oct 09 13:54:14   474.6    16216411       54     3.3%     34.2     1.0%     0.0%    96.7%     0.0%     0.0%    0.0%


The mapping rate after running the data with hisat2 default parameters is also similar.

Time loading forward index: 00:00:02 Time loading reference: 00:00:00
Multiseed full-index search: 00:00:04 1000000 reads; of these:
1000000 (100.00%) were unpaired; of these:
999572 (99.96%) aligned 0 times
169 (0.02%) aligned exactly 1 time
259 (0.03%) aligned >1 times
0.04% overall alignment rate


I checked the data again and confirmed that I did not mistake the species from which the data came. Why does this happen and how can I solve this problem?

STAR Bioinformatics hista2 • 409 views
0
Entering edit mode

Looks like many reads are "too short". Have you checked read length distribution with fastqc or something else ?

% of reads unmapped: too short | 93.77%

2
Entering edit mode

STAR's "too short" doesn't usually mean literally too short. It just means the reads didn't map. I'd pull out the most common unmapped reads and see what they are.

0
Entering edit mode

Yes, I checked one of the files SRR9971635 (fastq.gz file after rRNA removal) with fastqc, and found that the length of almost all reads is 40-60bp, which is too long for ribo-seq (This is another point that confuses me).

0
Entering edit mode

select one high quality sequence that you can BLAST or BLAT but doesn't align with STAR and copy-paste it here