Question: finishing a genome from assembly of contigs
0
gravatar for silvia.caprari84
3.2 years ago by
Germany
silvia.caprari8450 wrote:

I all, I am new in analysing sequencing results coming from ngs and I sent some clinical bacterial isolates (they are likely to have plasmids)to be sequenced with Illumina. I anticipate that I am completely new with the terminology , methodology and everything else..I got files named "reads" and files"contigs" from the company. so, if I understood correctly the contig files are the reads assembled, right? and I shouldn't need to assemble on my own if they are already assembled by Illumina, right?Correct me if I go wrong, please What if I wanted the "finished version" of a genome(I mean the chromosome and the plasmids separate and ready to be deposited..)? should I assemble the contigs all together?.. and how do you do it?

Also, could you have more contigs with the same sequence?could it be a result of the overlapping methodology performed by the sequencing?

I also noticed that if I run Blast by using a sequence of a known protein as a query against a file containing all the contigs, the known sequence matches more contigs, and most often the same sequence in different contigs can be different in a few nucleotides that result in a different identity percentage with the known sequence...why does it happen? if there are more contigs for a same sequence, should not this latter be exactly in the different contigs? Is this due to the sequencing methodology?

Sorry for my questions..I am completely new with terminology, methodology etc..and I have no one to ask at the moment.

Thank you so much again.

Silvia

ADD COMMENTlink modified 3.1 years ago by sidrairshad290 • written 3.2 years ago by silvia.caprari8450
1

If you want to have help here, I think it is better if you ask only one question and not so many. Anyway, you can get a look at this: Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data, Completing bacterial genome assemblies: strategy and performance comparisons

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by dago2.5k
0
gravatar for silvia.caprari84
3.2 years ago by
Germany
silvia.caprari8450 wrote:

yes, really sorry about that. then can I just ask just your opinion about this: if I run Blast by using a sequence of a known protein as a query against a file containing all the contigs, the known sequence matches more contigs, and most often the same sequence in different contigs can be different in a few nucleotides that result in a different identity percentage with the known sequence...why does it happen?I would expect no differences in nucleotides. thanks

ADD COMMENTlink written 3.2 years ago by silvia.caprari8450
1

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLYlink written 3.2 years ago by WouterDeCoster42k
0
gravatar for dago
3.2 years ago by
dago2.5k
Germany
dago2.5k wrote:

It can be that the gene you are using as query has similarity to multiple genes in the genomes, either because the gene is repeated or because there are multiple version of it. On the other hand you should check how clean your genomes are. This is usually done considering the presence of unique genes for the specific phylogenetic group your bacteria belongs to. I personally use CheckM. The point is that it could simply be that you have contamination, meaning other sequences other then your genome of interst.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by dago2.5k
0
gravatar for silvia.caprari84
3.2 years ago by
Germany
silvia.caprari8450 wrote:

Thank you very much dago

ADD COMMENTlink written 3.2 years ago by silvia.caprari8450

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLYlink written 3.2 years ago by WouterDeCoster42k
0
gravatar for sidrairshad29
3.1 years ago by
sidrairshad290 wrote:

Dear All,i I have somewhat similar queries like Silivia, I have sequenced my bacterial strain by illumina Hiseq, they send me a file having reads, Then i generated contigs using VELVET. Now i have 149 unordered contigs. Could you please guide me how i could get complete genome out of it. Also my draft genome is annotaed. Is there any need for complete genome for phylogenetic and comparative genome analysis?

ADD COMMENTlink written 3.1 years ago by sidrairshad290

You should open a new post and not add question to old questions. You cannot have complete genome from a WGS. You have contigs that you might or not order using a reference genome. In any case having a complete genome requires some PCR work, but I would say that in most of the cases is not necessary in comparative genomics studies.

ADD REPLYlink written 3.1 years ago by dago2.5k

so that means i can use my contigs for phylogenetic analysis and comparative genome analysis as they are?

ADD REPLYlink written 3.1 years ago by sidrairshad290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 942 users visited in the last hour