Low Mapping Rate with Kallisto on RNA-seq Data
0
0
Entering edit mode
7 hours ago
meetmet ▴ 10

Hello,

In my current job, I am dealing with perturbation rna-seq data from cancer biopsy. The sequencing library is prepared using heat lysis and polyA tail enrichment to only select mRNA data.

Here is my problem, after trimming and assessing sequencing quality (which is good), I notice a very low mapping ratio on my data (between 10 to 30%). I use kallisto for mapping with kmer=31. I check for my reads size distribution, and they are most of the time over the required 31 n lenght.

I checked for contamination using fastq_screen, and most of the reads (>90%) mapped to human genome on a subset of 2M reads.

Do you have any idea where these non-mapping reads comes from? Is it possible that my reads are mostly from intronic region (even with polyA purification, explaining why we see high mapping ratio with bowtie2 from fastq_screen)?

RNAseq mapping ratio kallisto low • 489 views
ADD COMMENT
0
Entering edit mode

Could be genomic DNA contamination. Enrichment is just that, enrichment, not perfect selection without noise. Poor RNA quality usually increases noise. In the end it does not really matter, since in silico magic cannot save library issues. If on-target counts are too low you have to sequence deeper.

ADD REPLY
0
Entering edit mode

Kallisto is a pseudoaligner, so, as ATpoint suggested, it could be DNA contamination. You can confirm this using an aligner and a reference genome.

ADD REPLY
0
Entering edit mode

I checked for contamination using fastq_screen, and most of the reads (>90%) mapped to human genome on a subset of 2M reads.

Looks like OP has done that, assuming that the samples are human.

ADD REPLY

Login before adding your answer.

Traffic: 3154 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6