Question

Bacterial plasmid analysis

0

Entering edit mode

14 days ago

nicole.kavanagh • 0

Hi there,

Apologies if this is a silly question but my bioinformatic experience is quite limited and unfortunately entirely self-taught (I am a PhD student). I was hoping for some advice please.

I have fully closed a total of 193 bacterial plasmids (from a total of 39 isolates from the same bacterial species) using hybrid assembly (Illumina and MinION technologies). Some bacterial isolates contain multiple plasmids ranging from 3-8 in total. They vary in size from 2,947 to 289,861 bps and are from different plasmid families (Rep types). I have exported each individual plasmid as a unique fasta file. I was just wondering if there is a way to assess genetic relatedness (and visibly display) between these plasmid files? The main aim of what I am trying to do is to show that specific plasmids are unique to specific strain types (ST) and will therefore cluster on this basis. So for example, for one specific ST I have 52 plasmids from a total of 10 different isolates and I want to show that these plasmids are similar to each other? I have already REP-typed them but I feel that this is not discriminatory enough. I have seen papers make a core-plasmid gene analysis using Roary and generate a phylogenetic tree on this basis (the plasmids were all from the same family and similar in length), however I'm not sure this would be appropriate in my case as the plasmid sequences are obviously much more diverse and vary in size.

Hopefully that makes sense. I really appreciate any help or input.

Thanks, Nicole

bacteria plasmid wgs hybridassembly sequencing • 740 views

ADD COMMENT • link updated 8 days ago by GenoMax 142k • written 14 days ago by nicole.kavanagh • 0

1

Entering edit mode

however I'm not sure this would be appropriate in my case as the plasmid sequences are obviously much more diverse and vary in size

Can you roughly subdivide them into groups based on their size? I assume these bacterial strains are a single organism (or closely related)? Then you could use roary on the groups of plasmids to generate the trees.

ADD REPLY • link 14 days ago by GenoMax 142k

0

Entering edit mode

Thank you so much for your quick response, I have been searching scientific papers for about a week but have failed to come up with a reasonable work-flow and thought I better ask for advice. Yes, all of these bacterial strains are a single organisms (Enterococcus faecium). I have been thinking of trying this but have several questions:

1) If plasmids are similar in size but a different plasmid family - will this impact the core-gene plasmid output? I.e. if they are quite diverse even if they are similar in size? 2) What would you consider is a reasonable division based on size? Would plasmids ranging from 2,000-20,000 bp, 30,000-90,000 and 100,000-200,000 be too large of a division?

Thank you again, I really appreciate the advice.

ADD REPLY • link 14 days ago by nicole.kavanagh • 0

0

Entering edit mode

Can you classify the plasmids based on function (resistance genes they are carrying or some other criteria). Sizes above are indicating a wide range so criteria for classification may need to be chosen in a way that makes biological sense.

ADD REPLY • link 14 days ago by GenoMax 142k

0

Entering edit mode

I think I will perhaps use their predominant PlasmidFinder type to group plasmids together and create separate core phylogenies for each as the size range tend not to be quite as large (e.g, one core phylogeny for rep11a types etc). Hopefully this makes sense. Thank you for your help! My final year PhD brain was ready to burst :)

ADD REPLY • link 14 days ago by nicole.kavanagh • 0

GenoMax · Answer 1 · 2024-05-02

3

Entering edit mode

13 days ago

shenwei356 8.5k

My colleague has an unpublished tool for plasmid analysis (clustering), and it has really good results. You might have a try. https://github.com/iqbal-lab-org/pling

ADD COMMENT • link 13 days ago by shenwei356 8.5k

2

Entering edit mode

I am the colleague! My tool groups together plasmids based on two genetic distances: containment (how much of the smaller plasmid's sequence is contained in the larger), and DCJ-Indel (which counts number of rearrangements and large indels distinguishing two plasmids). It uses these distances to build a relationship network, and various visualisations of this network are part of the output! First an initial containment network is built where each node is a plasmid, and an edge between two plasmids/nodes is added if at least 50% of the smaller one is contained in the larger. Then we induce a subnetwork on this containment network, in which we remove any edge that has DCJ-Indel distance greater than 4. You can think of it like this: plasmids will share an edge if they have enough sequence in common, and don't have a lot of structural changes between them. We use the initial containment network to assign a broad community based on containment distances, and then the DCJ-Indel subnetwork to assign a tighter subcommunity based on both containment and DCJ-Indel distances. The distances we use means that the tool copes very well with differences in size. The documentation in the github will have some more details on the approach.

We've found that generally the subcommunities the tool produces tend to have a reasonable amount of core genes, so you can even still do a classic core gene phylogeny on top of it.

It is still a work in progress, so installation might be a little clunky and documentation isn't quite finished, but we've had a couple of people use it and they've found it okay, and I am more than happy to answer any questions! Since you have already isolated each plasmid into an individual fasta file, all you'd need to do is install the tool, and then you'd be good to go.

ADD REPLY • link 13 days ago by Daria ▴ 30

1

Entering edit mode

I vote for this, it's the best! (but also, I am Daria's PhD supervisor so I am biased). Preprint out soon(ish) [no pressure Daria!]

ADD REPLY • link 13 days ago by Zamin Iqbal ▴ 20

0

Entering edit mode

Thank you so much for the recommendation and the very clear explanation , it's much appreciated. I have already made core-gene phylogenies/ FastANI matrices for each of my plasmid rep types, but I will definitely try out pling as it would nice to visibly represent them in the same network cluster rather than on individual NJTs. I notice on your github page it states that genomes must be circular - apologies if this is a silly question but I have a number of linear plasmid types that I would also like to include. Does pling allow linear plasmids to be clustered alongside circular ones?

ADD REPLY • link 13 days ago by nicole.kavanagh • 0

1

Entering edit mode

Hi Nicole.

The clustering will be quite different between pling and the rep types (because plasmids with one rep type can still be very dissimilar) - i think it is quite possible you will get bigger core genomes in your pling clusters, so more information for your trees
do you mean these are plasmids that have not been circularised, or they are plasmids which you know are linear and are biologially never circles? Anyway, I would tell you go ahead and use it. Daria would tell you that if the plasmid has not been circularised, you may artificially inflate the structural distance between it and other plasmids which it is in fact identical to or v similar to. We are both right (IMO). Basically, if you are willing to accept that you are inflating errors a bit, go ahead, but be aware of it. It may for example prevent two plasmids from clustering which in truth should be clustered.

ADD REPLY • link 13 days ago by Zamin Iqbal ▴ 20

0

Entering edit mode

Hi Daria,

I have downloaded pling and the other dependencies as recommended on your github page. I have tried to run the tool a few times but I am always returned with the following error

"MinIONs-iMac:pling minion$ python run_pling.py /Users/minion/Desktop/All_plasmids /Users/minion/Desktop/pling_out align
Batching...

ModuleNotFoundError in file /Users/minion/pling/pling/batching/Snakefile, line 1:
No module named 'pling'
  File "/Users/minion/pling/pling/batching/Snakefile", line 1, in <module>

Command 'snakemake --snakefile /Users/minion/pling/pling/batching/Snakefile --configfile /Users/minion/Desktop/pling_out/tmp_files/config.yaml --cores 1 --use-conda --rerun-incomplete --nolock  ' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/Users/minion/pling/pling/run_pling.py", line 182, in <module>
    main()
  File "/Users/minion/pling/pling/run_pling.py", line 179, in main
    pling(args)
  File "/Users/minion/pling/pling/run_pling.py", line 130, in pling
    raise e
  File "/Users/minion/pling/pling/run_pling.py", line 125, in pling
    subprocess.run(f"snakemake --snakefile {get_pling_path()}/batching/Snakefile {snakemake_args}", shell=True, check=True, capture_output=True)
  File "/Users/minion/miniconda3/envs/bactopia/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'snakemake --snakefile /Users/minion/pling/pling/batching/Snakefile --configfile /Users/minion/Desktop/pling_out/tmp_files/config.yaml --cores 1 --use-conda --rerun-incomplete --nolock  ' returned non-zero exit status 1."

Could you please offer some advice? Apologies, my command line knowledge is all self-taught and it is quite possibly something obvious on my end.

Thanks! Look forward to trying your tool :-)

ADD REPLY • link updated 8 days ago by GenoMax 142k • written 8 days ago by nicole.kavanagh • 0

1

Entering edit mode

Hi, you need to run it with PYTHONPATH=<pling_path> in front of the command (a bit clunky, I know -- will likely be fixed in the next version). Currently, without specifying the PYTHONPATH first the tool doesn't find all the scripts it needs. If anything else comes up, email me at daria@ebi.ac.uk, so we're not clogging the thread here with tool issues!

ADD REPLY • link 8 days ago by Daria ▴ 30

0

Entering edit mode

Once you have some of these basic issues sorted out you can create a tools post describing pling.

ADD REPLY • link 8 days ago by GenoMax 142k

0

Entering edit mode

nicole.kavanagh : This may be best posted as a new question since you are now running into a problem with tool usage.

ADD REPLY • link 8 days ago by GenoMax 142k

score 1 · Answer 2 · 2024-05-02

If you put all your sequences in separate files but in the same directory, this program will build a cladogram based on average nucleotide identity:

https://github.com/MrOlm/drep

Similar sequences will be leaves of the same branch. You may have to fiddle with minimum contig size as this program is normally not meant for really small sequences, but it should work with them.