Question: Best alignment software for mapping short DNA reads to transcriptome?
gravatar for Joel Wallenius
5 weeks ago by
Joel Wallenius70 wrote:


I googled but found only issues of RNA read quantification, which is fair enough but not the help I would like. I'm just curious what percentage of my DNA reads are within my organism's transcriptome.

I was going to do it with BWA but then I read that BWA is for short reads vs a large genome, and the transcriptome is obviously not as large as the corresponding genome...


Big thanks in advance!


dna alignment transcriptome • 89 views
ADD COMMENTlink written 5 weeks ago by Joel Wallenius70

If you are aligning to transcriptome why not use salmon or kallisto instead?

That said, when possible you should always align to the genome and then account for reads falling in expressed part using a counting program like featureCounts.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax78k

There is no reference genome I'm afraid... the dinoflagellate genome is enormous, I find only bits and pieces of it at NCBI. I have the transcriptome only

The transcriptome is CDS, sadly. I suppose this introduces a risk of false positives as the organism is a eukaryote, with exons all over the place. I don't need exact numbers though, so maybe that's fine. Regardless I don't see what options I have, really. There is no other reference... (I might be able to get my hands on the reads that built the transcriptome though).

ADD REPLYlink written 5 weeks ago by Joel Wallenius70

Have you looked to see if NCBI has any EST datasets you could potentially use as a stand in?

ADD REPLYlink written 5 weeks ago by genomax78k

How would that help? I'm unfamiliar with ESTs but based on what Wikipedia says they're just fragments of cDNA, i.e. they map to transcripts, so they're having the same problem my cDNA sequences in my transcriptome do. :( Am I missing something?

ADD REPLYlink written 4 weeks ago by Joel Wallenius70

EST's would be better than using single exons/CDS's to count but that is about it. Ideally you should be do an RNAseq project of your own and then assembling your own transcriptome to get more definitive answers. I am curious as to why you just did DNA sequencing or is this actually an RNA sequencing project (RNA --> DNA --> sequenced).

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax78k

I joined this project late so I can't motivate the reasons why we have the data we have. We have RADseq DNA reads from all over the genome, and now we want to know approximately what percentage of those reads are within coding DNA. I can't think of a better analysis than mapping to CDS or ESTs, despite the flaws.

ADD REPLYlink written 4 weeks ago by Joel Wallenius70

That is a really unusual application of RADseq data. For a genome that has no genome/transcriptome available. Do the best you can is the only thing to say here.

ADD REPLYlink written 4 weeks ago by genomax78k

I'll do that, then. Thanks :-]

ADD REPLYlink written 4 weeks ago by Joel Wallenius70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1552 users visited in the last hour