11 months ago
MSRS ▴ 520

Hi Biostars Community,

I have spades assembly contigs of E. coli complete genome. From plasmidfinder ( we found several plasmids are present in these contigs with around 150-650 bp.

Plasmid Identity    Query / Template length Contig  Position in contig  Note    Accession number
Col(BS512)  100 233 / 233   contigs113  1956..2188      NC010656

IncFIA  100 388 / 388   contigs98   13867..14254        AP001918

IncFIB(pB171)   99.22   643 / 643   contigs99   1765..2407      AB024946

IncFII  98.08   261 / 261   contigs96   3490..3750      AY458016

IncI(Gamma) 100 137 / 141   contigs97   24329..24465        AP011954

Is there any way to separate those plasmids from contigs/fastq files? Thank you.

Not easily no. Removing named sequences is easy (check the forum for lots of answers), however unambiguously identifying a plasmid is an unsolved problem.

You can use plasmid finder tools and then separate the fasta's by name, but it's never going to be 100% effective. 150bp isn't so much a contig, as a read. They're all so short they're likely just junk. You aren't going to get a useful plasmid assembly out of those no matter what you do with them.


