Question: Very low mapping rate of mRNA seq data
gravatar for Seq225
4 months ago by
Seq22590 wrote:


I have ~100 paired end mRNA seq data (150 Nt read size) that were sequenced by Illumina MiSeq. I have clipped the adapter by cutadapt. I am getting low mapping percentage. I used BWA, Bowtie2, and STAR. For STAR, I am getting 3-30% mapping. Bowtie2 ~40% and BWA 40-45%. I am using assembled transcriptome of my particular organism.

I thought probably I was messing up something with the adapter trimming. But for paired end, it should not be a big issue, I guess. What else could I messing up???


ADD COMMENTlink written 4 months ago by Seq22590

Did you consider to have poor-quality data? Could be contaminated with genomic DNA, or contaminations from other species. Is there a reference genome of the species? If so align against that rather than the transcriptome to rule out gDNA contaminations. Also, blast a good number of unmapped reads to see where they could belong to.

ADD REPLYlink written 4 months ago by ATpoint13k

Thank you.

The reads are good quality (almost all of them). Not sure about any sort of contamination. Unfortunately, I do not have a ref genome.

ADD REPLYlink written 4 months ago by Seq22590

Addition to suggestions by ATpoint,

you can check several contamination:

  1. Bacteria or virus using kraken tool. 2.rRNA contamination. You can use blast as remotely to search unmapped reads against to your species data set in NCBI. 3.If you isolated your species from environmental samples, you can map its mRNA to some species that share the same environment with your species.
ADD REPLYlink written 4 months ago by Mehmet460

Agreed. With low-quality data, I meant the quality of the library, so gDNA and rRNA content. The sequencing data are typically robust from Illumina machines. Please give some details about the species and how the library was made. Does the species have poly-A RNA? If so, was it a mRNA enrichment or rRNA depletion kit, or just a total-RNA seq?

ADD REPLYlink written 4 months ago by ATpoint13k

ATpoint and Mehemt, thank you very much. I have talked with the sequencing core. They never depleted the rRNA. That could be the actual problem.

Do you guys know any way to find out the rRNA mapping percentage of my data? Like I said, I do not have a genome sequence for this organism. Is there any rRNA sequence Database?

Thanks again!

ADD REPLYlink modified 4 months ago • written 4 months ago by Seq22590

Sorry for the late reply. I would recommend you to do:

  1. You can search unmapped RNA reads against to rRNA database that you can obtain from NCBI of a very close species using blast. (please check blast manual in order to use it for this purpose). For instance, download all rRNA of the closely related species, and search, or you can use blast with remote option.

  2. You can map mRNA of your species to genome of a species that is very close to your species at a genus level.

ADD REPLYlink written 4 months ago by Mehmet460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1613 users visited in the last hour