Question

Best alignment software for mapping short DNA reads to transcriptome?

0

Entering edit mode

5.5 years ago

Joel Wallenius ▴ 220

Hello!

I googled but found only issues of RNA read quantification, which is fair enough but not the help I would like. I'm just curious what percentage of my DNA reads are within my organism's transcriptome.

I was going to do it with BWA but then I read that BWA is for short reads vs a large genome, and the transcriptome is obviously not as large as the corresponding genome...

Suggestions?

Big thanks in advance!

Joel

alignment Transcriptome DNA • 1.7k views

ADD COMMENT • link 5.5 years ago by Joel Wallenius ▴ 220

0

Entering edit mode

If you are aligning to transcriptome why not use salmon or kallisto instead?

That said, when possible you should always align to the genome and then account for reads falling in expressed part using a counting program like featureCounts.

ADD REPLY • link 5.5 years ago by GenoMax 152k

0

Entering edit mode

There is no reference genome I'm afraid... the dinoflagellate genome is enormous, I find only bits and pieces of it at NCBI. I have the transcriptome only

The transcriptome is CDS, sadly. I suppose this introduces a risk of false positives as the organism is a eukaryote, with exons all over the place. I don't need exact numbers though, so maybe that's fine. Regardless I don't see what options I have, really. There is no other reference... (I might be able to get my hands on the reads that built the transcriptome though).

ADD REPLY • link 5.5 years ago by Joel Wallenius ▴ 220

0

Entering edit mode

Have you looked to see if NCBI has any EST datasets you could potentially use as a stand in?

ADD REPLY • link 5.5 years ago by GenoMax 152k

0

Entering edit mode

How would that help? I'm unfamiliar with ESTs but based on what Wikipedia says they're just fragments of cDNA, i.e. they map to transcripts, so they're having the same problem my cDNA sequences in my transcriptome do. :( Am I missing something?

ADD REPLY • link 5.5 years ago by Joel Wallenius ▴ 220

0

Entering edit mode

EST's would be better than using single exons/CDS's to count but that is about it. Ideally you should be do an RNAseq project of your own and then assembling your own transcriptome to get more definitive answers. I am curious as to why you just did DNA sequencing or is this actually an RNA sequencing project (RNA --> DNA --> sequenced).

ADD REPLY • link 5.5 years ago by GenoMax 152k

0

Entering edit mode

I joined this project late so I can't motivate the reasons why we have the data we have. We have RADseq DNA reads from all over the genome, and now we want to know approximately what percentage of those reads are within coding DNA. I can't think of a better analysis than mapping to CDS or ESTs, despite the flaws.

ADD REPLY • link 5.5 years ago by Joel Wallenius ▴ 220

0

Entering edit mode

That is a really unusual application of RADseq data. For a genome that has no genome/transcriptome available. Do the best you can is the only thing to say here.