low mapping rate when using SRA RNA-seq data
1
0
Entering edit mode
7.0 years ago
Pei ▴ 140

Hi:

I am interesting in using human tissue data from SRA dataset SRP007412

however, after fastq-dump and running tophat

I found that the mapping rate is rather poor: most were < 70% and even 45% for some sample.

What would you typically do when encounter such low mapping rate public data?

Thanks in advance!

Best wishes

RNA-Seq • 2.2k views
ADD COMMENT
0
Entering edit mode
7.0 years ago

How did you measure the mappability? For RNA-Seq it works a bit differently as the reads are not coming from DNA, they are sequenced exons and splice junction boundaries. That's why Tophat/Cufflinks remap the unmapped reads to the special exon-exon junction libraries (transcriptome) and thus the mappability would be the reads mapped the first time plus the ones those get mapped again to the special libraries.

ADD COMMENT
0
Entering edit mode

Hi Sukhdeep:

I used the mapping rate provided in the align_summary.txt file, which provided by tophat.

I think this is the same as what you suggested.

Thanks.

ADD REPLY
0
Entering edit mode

Hey, that's right then. I don't we can generalise that public data has a lower mappability. You could try pulling some other recent datasets just to test that. It could also be that the library is over sequenced and thus producing lot of duplicates or some samples are contaminated. Run the downstream processing and see if you are happy with the results, if the saturation limit is reached, you might not care or could do anything about it.

Also, this might be a help

why low mapping rates for RNAseq?

ADD REPLY

Login before adding your answer.

Traffic: 2521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6