Question: Reference Guided Transcriptome Assembly
gravatar for Biomonika (Noolean)
7.2 years ago by
State College, PA, USA
Biomonika (Noolean)3.1k wrote:

I would like to assemble transcripts of several chromosomes. I do have sequences of these chromosomes from related species and would therefore like to do reference-guided transcriptome assembly.

I have two concerns:

1, which programs should I use? I have read about Cufflinks and have very little experience with it but it seems to provide only gtf file instead of sequences of all my isoforms. Since my reference is related species, I do think that differencies will be too big to be expressed just with gtf/bed file.

Is velvet's columbus option? What program would you recommend?

2, if I use as reference sequences from my related species, their gene content will vastly overlap. I suppose that this will make a lot of my reads impossible to align uniquely. Should I then use only one species reference each time?

Thanks a lot for advice.

ADD COMMENTlink modified 7.2 years ago by Prakki Rama2.4k • written 7.2 years ago by Biomonika (Noolean)3.1k
gravatar for Adrian Pelin
7.2 years ago by
Adrian Pelin2.4k
Adrian Pelin2.4k wrote:

First thing for me is always try a denovo assembly. Once you get your transcripts, you can use blastn or perhaps tblastx to see which ones are contaminants and which ones are from your chromosomes.

If your reference assembly is well annotated, than just extract your predicted genes, and use cufflinks to calculate FPKMs.

ADD COMMENTlink written 7.2 years ago by Adrian Pelin2.4k

Thanks, I did both, denovo assembly and blastn and tblastx to see how good my newly assembled transcripts align to chromosomes of my closely related species. Now I want to see how reference guided approach could/could not improve my transcripts.

ADD REPLYlink written 7.2 years ago by Biomonika (Noolean)3.1k

By looking at calculated FPKMs, you can see which predicted genes are expressed and which ones are not.

ADD REPLYlink written 7.2 years ago by Adrian Pelin2.4k
gravatar for Prakki Rama
7.2 years ago by
Prakki Rama2.4k
Prakki Rama2.4k wrote:

As far i know, the GTF file that you have generated might contain the coordinates of your reference species and even if you extract the regions using the coordinates, it would not generate the transcripts of your species rather it extracts the reference sequences regions.

When assembling transcriptome which does not have a genome but has only reference species transcriptome, you could try a consenus reference assembly, where your reads are mapped on the references and your get a consensus sequences of your species.

you can check this Reference Assembly - Mapping Reads To A Reference Genome for much more info on how to do it


ADD COMMENTlink written 7.2 years ago by Prakki Rama2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2265 users visited in the last hour