Question: Assembly for a single cDNA
0
gravatar for karpet34
17 months ago by
karpet3410
karpet3410 wrote:

Dear all,

I have performed a RT-PCR which give me a 1500 pb product. I sequenced it with the Illumina technology (2X250 paired-end reads). Then, since several weeks, I unsuccessfully assemble the reads to get the full length sequence. I have tried many of classic assemblers (cap3, ssake, arapan, minimus2, ...) but all of them provide multiple contigs some of which exceeds more than 5kb!

I checked that all the reads are mapped well on the reference.

I am looking for an assembler able to do the job. Is there anyone have an idea?

Thank you!

sequence assembly • 349 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by karpet3410
1

Hello, genomax was right: I got too much reads. Normalization with bbnorm and assembly with cap3 provide me a very good result. Now I must find the good option values to get a perfect assembly. Thank you all!

ADD REPLYlink written 17 months ago by karpet3410

A small educational note: if an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 17 months ago by lieven.sterck9.0k

What is the organism you're working on? You mention both align to reference and assemble. Do you wish to do both, and if so, why?

ADD REPLYlink written 17 months ago by _r_am31k

Thank you for your answers. I will rework according to your advices. I remove bad quality reads (nearly no reads removed), remove Illumina adapters, merge read 1 and 2 then I proceed to assembly. I align to a reference because only a part of my cDNA is known. The 3' part is unknown so I need to perform de novo assembly of this part.

ADD REPLYlink written 17 months ago by karpet3410

is the reference you're talking about genomic or also transcriptomic?

ADD REPLYlink written 17 months ago by lieven.sterck9.0k
3
gravatar for GenoMax
17 months ago by
GenoMax92k
United States
GenoMax92k wrote:

You probably have way more coverage than you need and that is likely causing the assembly problems. So consider downsampling the data. You can use bbnorm.sh from BBMap suite (guide here). You can also take a look at tadpole.sh (guide here) as an alternate k-mer based assembler.

ADD COMMENTlink written 17 months ago by GenoMax92k
2
gravatar for h.mon
17 months ago by
h.mon31k
Brazil
h.mon31k wrote:

Illumina sequencing is noisy, because due to the sheer volume of data generated, even low frequency errors and contaminants will get a good number of reads in the end. When assembling an amplicon, you will want to filter your contigs by coverage, as your "true" amplicon will have a much higher coverage than the errors and contaminants. Also, there are several pipelines specialized for targeted sequencing assembly, you may try one (or several) of them. Two I remember are ARC and HybPiper.

By the way, did you remove Illumina adapters before assembling?

ADD COMMENTlink written 17 months ago by h.mon31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour