Hello Biostar members,
I am new to the field of NGS data analysis and would like to get some advice on transcriptome assembly of non model plant species. I have both 454 and illumina data for Brassica plant. I am using both reference-guided and de novo assembly approach. My question are as follows (please excuse me if these questions have already been asked before in the forum):
- Is the word size optimisation helpful in getting best CLC de novo assemblies? Does auto option (word size 20) suffice? How to choose the best word size for a plant species?
- How to evaluate the quality of the de novo assembly obtained? I have been looking mostly into the N50 value and also into number of contigs with ORFs to evaluate. Is that the right approach? What other factors should I look into?
- For the reference genome-guided transcriptome assembly, which program is the best? How to differentiate between genes from different genomes in a polyploid when we use one of the diploid parent species as a reference?
I really appreciate your advice and help.