Question: [Solved] Differentiating between chromosomal and plasmid DNA
gravatar for Harry
2.8 years ago by
Washington, DC
Harry10 wrote:

Hello all,


I am a recent graduate working for a Public Health Laboratory. I'm relatively new to bioinformatics, and most of what I know is based around NGS analysis. My lab director loves to challenge me. He wants to know different ways NGS (we have a MiSeq) could be implemented in our lab as an outbreak investigation tool.

I am aiming to do a study of Carbapenem-Resistant Enterobacteriaceae (CRE). The main goal is to be able to receive a CRE sample and use NGS to detect the genes (beta-lactamases) that are responsible. The idea is to be able to run quick analysis, while also compiling genetic information that could be used to connect the dots in an outbreak investigation (Phylogeny).

What I already Know

  • The genes that I am looking for can be found within the bacterial chromosome, or within its plasmids
  • For each gene, primers need to be designed for them.

The Actual Questions

First: If I wish to take the whole-genome-sequencing approach, how would I be able to tell which parts of my output (FastQ) are plasmids vs. which parts are chromosome?

Second: If I didn't want to do whole-genome, would it be possible to only sequence the genes that I'm looking for (if they are there)? And if so, how would I do it?

Open for Discussion

If anyone has any suggestions, solutions, or wishes to point me in a direction where I can learn more, please let me know. It would be a huge help, and is deeply appreciated.

Edit: Solution Found

Thanks to those who commented before, I know have a better understanding on how this all works. Also, it put me on a path to find an example of how this type of experiment is done in a clinical laboratory. You can find the study here.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Harry10

One of my co-workers tried using machine learning to distinguish between plasmids and main genome based on the genes present after annotation, with some degree of success. This is much easier on assembled contigs than raw reads, which are usually too short for annotation. It should also be theoretically possible to analyze the graph structure during assembly to determine which contigs are co-located and the size of the chromosome they are located on. This can also be done after the fact using a graph file that some assemblers produce.

You can certainly try selectively amplifying the genes in question with the correct primers, but I think WGS is probably simpler and more robust. You can assemble the reads and then compare the contigs to your genes in question, or simply map the raw reads to the genes in question; either works. The MiSeq has sufficient capacity to sequence 30+ bacteria per run with 40x coverage, depending on the run mode (that's in 24 hours at 2x150bp).

ADD REPLYlink written 2.8 years ago by Brian Bushnell17k

I'm starting my masters program soon, and machine learning is something I am really interested in learning. Any resources you could recommend on the subject? Also, based on your response (as well as Harold's), it seems like using WGS is what will make the most sense. Selectively amplifying genes might be something I might try later down the line, but I know I'm just not there yet.

Thanks for the response, it has put me on an avenue of progressive learning.

ADD REPLYlink written 2.8 years ago by Harry10

@Brian do you have a link to the tool? I'm interested in attempting a similar classification problem so would like to see the approach.

ADD REPLYlink written 2.8 years ago by Joe16k

No, the tool was never finished or made public, sorry. Though I will ask my co-worker about the status and results and report back if there's anything interesting to note.

ADD REPLYlink written 2.8 years ago by Brian Bushnell17k
gravatar for harold.smith.tarheel
2.8 years ago by
United States
harold.smith.tarheel4.5k wrote:

1) From WGS data, chromosome vs episome can be distinguished by copy number (reflected in differences in read depth). With appropriate data, you can also assemble the genomes and distinguish the two by contigs.

2) Search for 'amplicon sequencing'. Note that the MiSeq, at 10M+ reads/run, is overkill unless you're barcoding 1000s of samples.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by harold.smith.tarheel4.5k

Thank you for your response!

It definitely seems that doing WGS would make the most sense.

ADD REPLYlink written 2.8 years ago by Harry10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1859 users visited in the last hour