3.7 years ago by
University of Nebraska
1) Does this sound reasonable, too strict, not strict enough?
I'm a little thrown off by your question -- of course genomes from mixed samples are real genomes. You're trying to establish standards with which to describe new species of viruses from metagenomic data. There are already a lot of groups doing this, so I think standards are worked out to a certain extent. People have been defining organisms on the basis of their DNA for 4 (or more decades now), why should more data (really the only difference you have with metagenomic data, as we have been sequencing mixed samples through cloning, etc., for the last 30 years) change things?
Here's a commentary I co-authored earlier this year on the systematics and taxonomy of environmentally derived sequence data (focused on plants as a host - but human host associated papillomaviruses are not any different in an ecological sense), I hope it helps.
2) Obviously, assembly is a time consuming thing and isn't trivial. Would you guys like to share some thoughts on preferred assemblers, pipelines, etc...
This is a quickly evolving and constantly changing field right now and I don't think the community has come up with a preferred pipeline or system. I feel like I could write a book on what to do and what not to do here, but I think you have to dive into the literature.
The main assembly program I use now for metagenomics (meghit), didn't exist a year ago, so it's hard to gauge standards at the moment. There are tons of new great tools, and twice as many poor tools out there.
For what you are working on with papillomaviruses, I would highly recommend the pipeline (though it's less than 2 years old and probably dated at this moment) from this really good paper from Ital Sharon in Jill Banfield's Lab: Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization.
The bottom line (Istvan Albert mentions this in his answer) is the fear of assembly chimeras or misassembled metagenomic genomes. How do you know what you have assembled is actually the correct genome in your sample? Long reads will change this field, but for now, you have to be extra careful making claims you have a new organism on the basis of metagenome assembly. I would look at any metagenome assembly or short-read annotation with skepticism.
modified 3.7 years ago
3.7 years ago by
Josh Herr ♦ 5.6k