Question: why cd-hit-est tool could not recognize identical contigs in the assembled transcriptome input file?
0
gravatar for seta
4.6 years ago by
seta1.2k
Sweden
seta1.2k wrote:

Hi everybody,

I have almost high quality Illumina reads that were trimmed and assembled using CLC genomic workbench software, I tried different K-mer size and got the best assembly, in terms of some basic parameters, like N50, the number of contigs and the percentage of mapped back reads (about 90%) in K-mer of 64. This assembly (44.8 MB in size) was subjected to cd-hit tool with strict 100% identity for the alignments since plants have many paralogues with have high sequence identity, but, the size of output file has not been changed. I also tried this analysis with stringency of 0.9, and the output size file (43.5MB) did not change significantly as compared with the original input file (44.8 MB). Could anybody please let me know whether these results are usual or there is something wrong? Thanks for any comments

blast rna-seq assembly • 1.5k views
ADD COMMENTlink written 4.6 years ago by seta1.2k
1

Why would you expect to have identical contigs in your assembly? The point of an assembler is to collapse and extend overlapping sequences. 

ADD REPLYlink written 4.6 years ago by Istvan Albert ♦♦ 81k
0
gravatar for seta
4.6 years ago by
seta1.2k
Sweden
seta1.2k wrote:

I have no experience in this filed and expect some redundancy just based on similar published works. So, it's normal in your view

ADD COMMENTlink written 4.6 years ago by seta1.2k

I don't think any assembler should ever generate exactly identical contigs. I can understand how similar contigs could be present but it would not be reasonable to expect identical contigs.

ADD REPLYlink written 4.6 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1677 users visited in the last hour