Question: Seeking help with transcriptome assembly
0
gravatar for Svishwan
3.6 years ago by
Svishwan0
Canada
Svishwan0 wrote:

Hello Biostar members,

I am new to the field of NGS data analysis and would like to get some advice on transcriptome assembly of non model plant species. I have both 454 and illumina data for Brassica plant. I am using  both reference-guided and de novo assembly approach. My question are as follows (please excuse me if these questions have already been asked before in the forum):

  1. Is the word size optimisation helpful in getting best CLC de novo assemblies? Does auto option (word size 20) suffice? How to choose the best word size for a plant species?
  2. How to evaluate the quality of the de novo assembly obtained? I have been looking mostly into the N50 value and also into number of contigs with ORFs to evaluate. Is that the right approach? What other factors should I look into?
  3. For the reference genome-guided transcriptome assembly, which program is the best? How to differentiate between genes from different genomes in a polyploid when we use one of the diploid parent species as a reference?

I really appreciate your advice and help.

Thanks.

rna-seq • 1.3k views
ADD COMMENTlink modified 3.5 years ago by seta1.1k • written 3.6 years ago by Svishwan0

Hey,

on your first question I can't give you an advice since  I haven't used CLC. How to evaluate an assembly is a pretty hard one and actually depends on what you want to do with it. If you are only interested in as many possible/probable ORFs as possible the N50 shouldn't bother you too much, imho. However, maximizing your N50 comes with more ORFs possible to predict so maybe it is still worth to keep an eye on that even though you do not want to present the perfect assembly. In general the criteria you already stated are, say, complete enough, to get a good, acceptable assembly. For further information on "How to judge the quality of an assembly" have a look here or search for yourself for things like assemblerthon...

Your third question is also a pretty hard one because all the tools have pros and cons. Since you are new to the field I would suggest: Just try some of them. Probably the ones which sound most promising to you...

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Phil S.660
0
gravatar for seta
3.5 years ago by
seta1.1k
Sweden
seta1.1k wrote:

Firstly, your questions are general that responded them almost in every paper working on transcriptome assembly.Word size in CLC is the same with K-mer and of course it affects the quality of your assembly. It's better to do different transcriptome assembly at various K-mers and select the best one based on some criteria like, N50, maximum contig length, percentage of mapped back reads, etc.

Good luck

ADD COMMENTlink written 3.5 years ago by seta1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 641 users visited in the last hour