I was confused about the method to confirm whether our contig sequence is really a plasmid or not. I have downloaded the whole genome sequence assembly (either from long read or short read) from a bacterial species, which was H. pylori. This bacteria is known to have a very low copy number of plasmid and somehow the plasmid itself is not easy to detect especially based on the whole genome sequence. The method I used was just blasted by using Abricate software against the plasmid databases (such as PLSDB, plasmidfinder, and I made my own database by downloading all plasmid sequences from NCBI/other public databases). I settled the coverage and identity by 90.
After I had detected some plasmids, I tried to extract the plasmids contigs from the WGS data. I got several types of plasmids, but one type of these plasmids seems to have GC content around 38-39 which is a little bit high for plasmids.
My questions are:
- How do we know that our contig sequence (that we extracted from the assembled WGS) is really plasmid?
- How do we check whether these contig sequences are extrachromosomal bacteria?
- Is there any tool to do predict whether our contig sequence is a plasmid or not?
i think most of them are sequence based detection which you already have done. May be you can trying using multiple plasmid identification tools (plasmidseeker, plasmidspades etc) and take a consensus of the findings or use a complementary experimental technique to confirm your findings.
Thank you. I already did use plasmidseeker as your suggestion. I also tried to validate manually one by one. Now it becomes clear. Thank you.