Question: Linking contigs to each other
0
gravatar for Fman
4.8 years ago by
Fman0
Belgium
Fman0 wrote:

Dear all,

I have some sequencing results, however after trying to assembly them I got 3 different contigs. I know want to link those contigs so that I get 1 assembly (1 large plasmid).

Is it ok to just design primers at the exterior parts of the contigs in the hope I will have overlap between the contigs and this 1 large config rather than 3 small ones?

 

What I mean, schematically, is this:

 

contig 1

XXXXXX..............XXXXXX

<---- primer1 ......  primer 2 -->

 

Contig 2

 

YYYYY..................YYYYY

<---- primer 3 ......  primer 4 -->

Contig 3 

 

ZZZZZZZZZZZZZ...........ZZZZZZZZZZZZ

 

<---- primer 5 ......  primer 6 -->

 

So I just design primers to sequence the endings further.

 

Or is there another trick? 

 

thanks in advance

assembly • 2.6k views
ADD COMMENTlink modified 4.8 years ago by Adrian Pelin2.3k • written 4.8 years ago by Fman0

The gaps you want to close, are the gaps between contigs in a scaffold? Like Josh Herr said, we need more info.

ADD REPLYlink written 4.8 years ago by Adrian Pelin2.3k

Yes, I assume they are since its just a plasmid I am sequencing.

 

ADD REPLYlink written 4.8 years ago by Fman0
0
gravatar for Josh Herr
4.8 years ago by
Josh Herr5.6k
University of Nebraska
Josh Herr5.6k wrote:

You can certainly design primers to bridge gaps in your assembly if you are interested in closing gaps with Sanger sequencing, but you may be able to easily bridge gaps with your existing sequencing data.

You didn't provide us with any information on your sequencing data or how you assembled your reads, so we don't have any information on what you have done previously and how you might be able to use your existing reads (hopefully paired end reads) to provide information on how these assembly gaps may be bridged.  It would also be helpful if you provided some information on where your sequences come from (organism) as existing knowledge about sequence diversity can help here also.  

There are numerous tools to do this, but I would recommend certains tools over others based on the type of sequencing data you have, the length of gaps to be bridge (figure out from paired end reads), and the organism you are studying. 

Can you provide us some more information by amending your original question?

ADD COMMENTlink written 4.8 years ago by Josh Herr5.6k

Well, actually its just a plasmid that I am sequencing!

 And for some reason I have problems getting a part of it sequenced!

I have some old data (it should be the plasmid, but it does not seem to be correct) that I used to create some primers. This worked for 80% of the plasmid. When I map my sequences to the plasmid, I have 1 part that stays "unknown".

When I just assemble the sequences (not to the plasmid map I have) then I get 3 contigs.

So my guess was to simple create primers for each end of the contigs and then just hope the contigs would be bridged.

I use CLC for this, so its pretty automated.

 

 

 

ADD REPLYlink written 4.8 years ago by Fman0

You mention 3 contigs that you get from a de-novo assembly of your reads? Are you sure all 3 represent you plasmid? do they blast against it? do they have roughly the same nucleotide coverage? If yes, and if you want to go the PCR way are you thinking of linking them?

Another thing I can suggest is to try a different assembler which may be able to link these 3 contigs together by creating different contigs that overlap with your 3 contigs.

ADD REPLYlink written 4.8 years ago by Adrian Pelin2.3k

They are just from sequencing a plasmid with the primers I designed based on the sequence I have. I do not sequence it myself, a sequencing facility does it. I just send the template (the plasmid) and the primers I designed.

And yes: they are all from the same plasmid. And yes: all the reads I get do match with the plasmid I have. However they often match with other parts of the plasmid than expected (based on the plasmid sequence I have).

I am guessing the plasmid sequence I have is just not correct. For example primer X binds at region 258bps - 275bps , but when I get the read, this read (sequence) binds to a complete different region of the plasmid. So the sequence I have is just messed up/incorrect.

 

What other assembler do you propose using than?

ADD REPLYlink written 4.8 years ago by Fman0

Of course the most fresh data should give a more correct picture then anything previous obtained, but you have to make sure you are not making any mistakes. Initially I thought you have NGS sequencing, that's why doing Sanger sequencing seemed a bit unnecessary.

In your current assembly, in your different contigs, do you observe any gene synteny between your genome and that of closely related sequences on NCBI? Do you perhaps have a gene that is present on 2 of your 3 contigs? indicating those contigs are perhaps close to each other? Also, do you have any estimation of what your genome size is, and how that compares to your 3 contigs added up? There are important questions to answer before proceeding.

ADD REPLYlink written 4.8 years ago by Adrian Pelin2.3k

I'll have to look into it in more detail to answer those questions. But I do find genes that I expected. Whether they are partly on 2 or more contigs: can not tell at the moment. I'll have to check this.

I'll do this and reply again with the information.

 

The size is about 5000 to max 6000 basepairs.

 

ADD REPLYlink written 4.8 years ago by Fman0

I checked the contigs and its pretty weird: some of them do seem to overlap, so I do not understand why they are not assembled.

But its even more weird: some parts of certain genes are completely in the wrong place... It makes no sense what so ever. It seems that the sequence of the plasmid I got is completely wrong, but I do not understand why they are not assembled correctly (I mean: there is certain overlap between 2 contigs and still they are separated in 2 contigs)

Any other (free) programs I can use to assemble the sequences?

ADD REPLYlink written 4.8 years ago by Fman0
0
gravatar for Adrian Pelin
4.8 years ago by
Adrian Pelin2.3k
Canada
Adrian Pelin2.3k wrote:

All you have is PCR products which are Sanger sequenced, correct? People rarely assembly Sanger products anymore, but you can try software that should work well (Phrap, CAP3, etc...)

ADD COMMENTlink written 4.8 years ago by Adrian Pelin2.3k

No, its not a PCR product. Its a plasmid!

Its just a bacterial plasmid that has to be sequenced completely.

I tried to use the CAP3, but the website seems to be down or something, I always get an error.

 

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Fman0

I am talking about your sequencing methodology, not target. You are sequencing PCR products via Sanger?

ADD REPLYlink written 4.8 years ago by Adrian Pelin2.3k

No, its plasmid DNA I am having sequenced!

No PCR products, just pure plasmid DNA. I do not sequence it myself,  a company does this and yes, I think they use sanger sequencing.

 

ADD REPLYlink written 4.8 years ago by Fman0

Ok so they randomly pick out clones and sequence them. Have you tried aligning your reads back to your contigs with something like bwa and then visualizing in tablet if the assembly makes sense?

ADD REPLYlink written 4.8 years ago by Adrian Pelin2.3k

I tried ti align the sequence results to the original sequence I got from someone, but this sequence seems to be wrong or not entirely correct. I did not use bwa, I'll give it a try if its free.

The assembly does seem to make sense, but it makes no sense that 2 of the 3 contigs are not linked because they do show overlap.. This is the weird thing (besides that the original sequence seems to be wrong and/or that there are some mistakes in the plasmid in general like inverted regions)

ADD REPLYlink written 4.8 years ago by Fman0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1710 users visited in the last hour