why cd-hit-est tool could not recognize identical contigs in the assembled transcriptome input file?
1
0
Entering edit mode
9.2 years ago
seta ★ 1.9k

Hi everybody,

I have almost high quality Illumina reads that were trimmed and assembled using CLC genomic workbench software, I tried different K-mer size and got the best assembly, in terms of some basic parameters, like N50, the number of contigs and the percentage of mapped back reads (about 90%) in K-mer of 64. This assembly (44.8 MB in size) was subjected to cd-hit tool with strict 100% identity for the alignments since plants have many paralogues with have high sequence identity, but, the size of output file has not been changed. I also tried this analysis with stringency of 0.9, and the output size file (43.5MB) did not change significantly as compared with the original input file (44.8 MB). Could anybody please let me know whether these results are usual or there is something wrong? Thanks for any comments

RNA-Seq Assembly blast • 2.2k views
ADD COMMENT
1
Entering edit mode

Why would you expect to have identical contigs in your assembly? The point of an assembler is to collapse and extend overlapping sequences.

ADD REPLY
0
Entering edit mode
9.2 years ago
seta ★ 1.9k

I have no experience in this filed and expect some redundancy just based on similar published works. So, it's normal in your view

ADD COMMENT
0
Entering edit mode

I don't think any assembler should ever generate exactly identical contigs. I can understand how similar contigs could be present but it would not be reasonable to expect identical contigs.

ADD REPLY

Login before adding your answer.

Traffic: 2476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6