Question: Optimizing De Novo Transcriptome assemblies for non model organisms
2
gravatar for giorgiocasaburi
3.8 years ago by
United States
giorgiocasaburi90 wrote:

Hi all,

I have to analyze 24 transcriptomes (TRM) in order to compare gene expression in different conditions of an animal, which genome has not been annotated. I thought about a multiple assembly followed by a co-assembly in order to build the "main" TRM. After quality filtering, I was thinking to:

1. Assemble the 24 libraries (they came from different treatments) using X different assemblers (i.e Trinity, velvet, ../ multiple K-mer, etc.). This will give me 'X' x 24 assemblies.

2. Merge together 'X' x 24 assemblies with a co-assembly tool (i.e. CD-HIT-EST or CORSET or CAP3). Therefore, I will end up having one main transcriptome (Main-TRM), representing the animal object of the study. 

3. Performing functional annotation using the Main-TRM against SWISS-Prot, KEGG, GO ,etc. using blastx and blast-to-go.

4. Tacking the non-assembled quality filtered reads from the 24 libraries (before step 1, in order to retain the condition variable) and blast them individually against the annotated Main-TRM, having this way the expression information.

What do you guys think about this approach? Is it theoretically correct, if not what should I change?

Thanks a lot in advance,

~Giorgio

 

 

rna-seq assembly • 1.8k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by giorgiocasaburi90

Thanks Reema for your suggestions. Yes I was planning on diversify Trinity throughput using different parameters.

ADD REPLYlink written 3.8 years ago by giorgiocasaburi90

Also, this should have been a comment, not an "answer". I don't think I can move it for you, just saying.

ADD REPLYlink written 3.8 years ago by Madelaine Gogol5.0k

Hi Madelaine,

Thanks for your input. Yes I was thinking for step 4. to use BOWTIE and using the output (.bam) file to estimate transcription level abundance for each library using RSEM (RSEM: accurate transcript quantification from RNA-Seq data). I just have to figure out the best way to combine the 24 results for statistical purposes.  

ADD REPLYlink written 3.8 years ago by giorgiocasaburi90
Good point Pyperl, thank you! So you are suggesting to just merge all the libraries in one file and then run the assembly with different assemblers and then merge those together, is that right? I still would prefer doing the co-assembly in edfort of multiple assemblers used and to reduce redundancy.
ADD REPLYlink written 3.8 years ago by giorgiocasaburi90

Here, I would prefer Trinity for performing the assembly as it is de novo assembly. There is no need to use different assembler as it will consume your time and efforts. But, still you are curious to compare the output of different assembler then you can go ahead and can compare the diagnostics among different assembly.

ADD REPLYlink written 3.8 years ago by Renesh1.4k

It's not really about comparison, it's more about having a co-assembly derived from multiple assemblies (i.e. using Trinity but with different k-mer). Several papers suggest a co-assembly step after generating different assemblies with more tools or within the same tool but having used different parameters (e.g. k-mer). hope that makes sense.

ADD REPLYlink written 3.8 years ago by giorgiocasaburi90

I think trinity have fixed k-mer size (25) and this is optimal across different transcriptomes as per trinity developer.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Renesh1.4k
Yes I will be using a HPC obviously.
ADD REPLYlink written 3.8 years ago by giorgiocasaburi90

Hi, is there any update on this ? Have you reached annotation part ? I am having similar kind of data and would like to know if you have some summary on this.

ADD REPLYlink written 3.7 years ago by geek_y8.8k

Hi, not yet I'm waiting for other data, will update when I finish some of the initial steps.

ADD REPLYlink written 3.7 years ago by giorgiocasaburi90

Hi. I need to do something similar. Let me know how your analysis goes. In my case I have draft genome and very few annotations.

ADD REPLYlink written 3.8 years ago by geek_y8.8k
1
gravatar for Reema Singh
3.8 years ago by
Reema Singh150
United Kingdom
Reema Singh150 wrote:

Hello Giorgio,

In my view, you should give this approach a try.  Also you can start with simple approach as well. Like generate multiple assemblies at different parameters first. Then compare them on the basis of quality, completeness(comparing them with the existing/similar genome), CEGMA score. Just a suggestion from my own experience -  In case you use Trinity - try using different kmer parameters and coverage.

Best,

Reema,

 

 

ADD COMMENTlink written 3.8 years ago by Reema Singh150
0
gravatar for Madelaine Gogol
3.8 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

I don't know that much about assembly, but your first few sound reasonable... But on the last step:

"4. Tacking the non-assembled quality filtered reads from the 24 libraries (before step 1, in order to retain the condition variable) and blast them individually against the annotated Main-TRM, having this way the expression information."

I would just use all the reads for each transcriptome and align them to your main assembly using tophat or something.

Also, I would probably just stick with Trinity. Seems like combining the results of multiple assemblers could be confusing and introduce more errors.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Madelaine Gogol5.0k
0
gravatar for Renesh
3.8 years ago by
Renesh1.4k
United States
Renesh1.4k wrote:

If you want to make whole combine transcriptome from different libraries, then you should not assemble all libraries differently.Because it will create lot of duplicate transcripts as all of the libraries are from same organims. You should combine all libraries in one file and perform assembly to save your further complicated tasks. Obviously, this will be computationally expensive, you will need HPC for performing this.

ADD COMMENTlink written 3.8 years ago by Renesh1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1673 users visited in the last hour