Question: How to merge similar transcripts together?
gravatar for satshil.r
4.9 years ago by
United States
satshil.r50 wrote:


I have 8 transcriptomes which I assembled. 4 are from a gonad and 4 are from an ovary. There are also 2 groups within each tissue. I want to create a reference transcriptome containing the contigs from all 8 of my samples, annotate the reference and use it for DGE.

I cat'd the 8 files together and I have a 2GB, 1.8million contig file. I ran cd-hit-est with the following options:

"cd-hit-est -i reference.fa -o reference_90.fa -c 0.9 -T 0 -M 0"

The resulting file was still large with around 1.3M contigs and 1.6GB. How can I merge these remaining contigs more? When I annotated this 1.3M contig fasta I had multiple repeats for almost all genes, so there has to be a more concise way to merge similar transcripts together. Can someone help me out with this problem?

cd-hit rna-seq assembly fasta • 1.8k views
ADD COMMENTlink modified 4.9 years ago by Brian Bushnell17k • written 4.9 years ago by satshil.r50

1.6Gb for a transcriptome? 1.3 million contigs? This either is not a good assembly, or you are working with some crazy organism. What kind of organism are you working on?

ADD REPLYlink written 4.9 years ago by h.mon31k

It's a multi-kmer approach, so it's technically multiple assemblies combined together. That's why I want to reduce the redundancy. The Assembly is good,

ADD REPLYlink written 4.9 years ago by satshil.r50
gravatar for Brian Bushnell
4.9 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

The BBMap package has a program called Dedupe that collapses exact or contained duplicate sequences. Usage: in=contigs.fa out=nodupes.fa

If you want to avoid absorbing different-length transcripts, you can use the flag minlengthpercent=90 or similar; if you want to allow up to 3 substitutions difference, you can ushe the flag s=1; etc.

ADD COMMENTlink modified 10 months ago by RamRS30k • written 4.9 years ago by Brian Bushnell17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2212 users visited in the last hour