Question

Recommendation regarding antibiotic resistance genes in oceanic metagenomic samples

0

Entering edit mode

9 months ago

diegoestradamunoz • 0

Hello everyone, I am currently working on my final master's project, which focuses on antibiotic resistance genes in oceanic metagenomic samples. I am facing a challenge as I am using Amazon WorkSpaces, which has limited computational capacity. So far, using Megahit for contig assembly, I have been able to obtain fasta files containing approximately 10 million reads without causing AWS to crash (I am testing the limits of Megahit on this Amazon WorkSpaces, which has 2 CPU cores and 8 GB of RAM, so I might be able to assemble a few million more reads). However, the dataset I am working with is quite large. For instance, even a small sample consists of around 70 million reads.

The issue I am having is that, although I am aware that I am only analyzing 10 million reads out of a total of 70 million in this example (as there are other samples with even 300 million reads), I haven't identified any resistance genes yet (which I understand to some extent, but it surprises me that I haven't found anything at all in 10 million reads). Currently, I am using AMRFinder to detect resistance genes, and I would like to ask for your recommendation on an alternative program that I could use to analyze the fasta file obtained from Megahit and identify resistance genes in metagenomic samples. I have been considering using deepARG, which is the program used in the initial project, and for plasmids, I use PlasmidFinder. What are your thoughts on this? Thank you.

Programs. • 559 views

ADD COMMENT • link updated 9 months ago by seidel 11k • written 9 months ago by diegoestradamunoz • 0

score 0 · Answer 1 · 2023-07-05

0

Entering edit mode

9 months ago

Mensur Dlakic ★ 27k

I don't think you will be able to assemble with anything less than 32 Gb, and 64 Gb would be even better.

Not sure why you are expecting resistance genes in oceanic samples. Of all the habitats on Earth, oceans strike me as least likely to have antibiotics in them. Why would any organism release antibiotics into a gigantic mass of water, where they will be instantly diluted into sub-lethal concentration? And if there aren't any antibiotics around, why would target organisms carry resistance genes?

Assuming there is rationale for antibiotic genes in this sample, I wouldn't expect many of them. That means it doesn't surprise me that after assembling 1/7 of your sample there weren't any hits.

ADD COMMENT • link 9 months ago by Mensur Dlakic ★ 27k

0

Entering edit mode

Understood, the data I'm using comes from a study of different samples in various areas of the oceans, and they obtained antibiotic resistance genes (ARG), which is why I'm trying to find them. The idea is to find bacteria with antibiotic resistance genes in samples near the coast due to human presence, as many of our waste products end up in the sea. That is the hypothesis. The study from which I extracted the data found ARG in many locations, but it's true that they had a vast amount of data and more resources than I do. I suppose I will reconsider the whole matter. Thank you very much.

ADD REPLY • link 9 months ago by diegoestradamunoz • 0

0

Entering edit mode

If they already found ARG, do you have the sequences of these? This may be a dumb question - but do you expect any of the reads in your data set to map to these genes? (or to any known ARG?) I'm thinking of some kind of positive control. If you have the data set they used to find ARG, there should be reads in that data set that then map to ARG...at some rate. If you have a data set with 10 M reads, you could estimate what the rate might be for your subset, given their data set. Simple read mapping doesn't take many resources.

ADD REPLY • link 9 months ago by seidel 11k