Question: Can 167 contigs in my genome be stitched in to one fasta sequence
gravatar for Optimist
17 months ago by
Optimist40 wrote:

Hello all,

Is it fine if I stitch all the 167 contigs from my genome into 1 single fasts file?

Are there any tools which can be employed for this purpose.

I have a strain of bacteria with 167 contigs and I want to stitch them up together under 1 single FASTA header.

Thanks a lot

Cheers Optimist

ADD COMMENTlink modified 17 months ago by Michael Dondrup45k • written 17 months ago by Optimist40

Did you assembly this genome? Can you give more details about sequencing and assembly? My experience with bacterial genomes is if you have good coverage - starting 20-30x up to 100x of Illumina MiSeq - you should have good assemblies, with not so many contigs. One common cause to fragmented assemblies is contamination: although we believed we were sequencing a single, isolated strain, in fact there was a second strain in the culture. You can use tools such as blobtools to investigate your assembly.

ADD REPLYlink written 17 months ago by h.mon22k

This particular genome I was referring to was downloaded from NCBI. The reason why I was asking about stitching was that some tools like PHASTer warrants the use of Complete genome (not contigs or scaffolds) to detect the presence of prophages and annotate their position in a circular genome.

I'll surely look into the tool you have suggested

Thanks alot

ADD REPLYlink written 17 months ago by Optimist40
gravatar for Michael Dondrup
17 months ago by
Bergen, Norway
Michael Dondrup45k wrote:

It's not illegal, but that was possibly not what you meant ;) For most other use-cases it is definitely not fine. I'll just give you some hints:

  • You cannot tell which sequence is belonging to which contig anymore
  • You don't know the natural order of contigs
  • You are creating artificial chimeric sequences at the transitions
  • Sequences can come from different replicons
  • ....

  • You can possibly try to scaffold the contigs if there are additional long-range sequencing libraries (Fosmid, pacbio, ...) (SSPACE is the program)

  • You can keep the contigs as they are in a single multi-fasta file
  • If you still think you need to do this, you need to re-think the reason. There is probably a problem with your approach.
ADD COMMENTlink modified 17 months ago • written 17 months ago by Michael Dondrup45k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1130 users visited in the last hour