Question: Need Advice On The Proper Use Of Glimmer3 For Finding Genes In Microbial Dna
1
gravatar for rosarylimyt
7.0 years ago by
rosarylimyt70
rosarylimyt70 wrote:

I'm an undergrad taking up a beginner's project on metagenomic tools for annotation and visualization of the metabolic pathways. I'm confused by the workings of GLIMMER3. I tried experimenting with the sample files that came with the download to see if I fully understand how to operate GLIMMER3 by reproducing the given results.

The thing is I'm unsure if the scripts given (namely g3-from-scratch, g3-from-training, g3-iterated) are to be used in a consecutive manner.

I'd assumed these scripts had to be used in order starting with g3-from-scratch, and I've generated a number of files. Thereafter I'm stuck at g3-from-training. I downloaded and read through the only tutorial I could find and it says to type the command in the form:

g3-from-training.csh [yourgenom.seq] train.coords run2

I can't find a .coords file generated from the previous g3-from-scratch run and so I can't proceed. Please advise me on what I should do. I wanna progress from here! =[

metagenomics clustering • 2.1k views
ADD COMMENTlink modified 6.9 years ago by Josh Herr5.7k • written 7.0 years ago by rosarylimyt70
1
gravatar for Josh Herr
7.0 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

I've never used GLIMMER but am just now downloading it and will give it a test run including reading over the documentation, so I will try to help out. Please forgive me if I'm not helpful in this post, but no one else has posted anything yet so I thought I would try to contribute.

Typically with metagenomic read clustering or amplicon clustering you can de novo cluster based on sequence similarity (the g3-from-scratch script). For these clustering methods there is no a priori designation, just a measurement of sequence similarity. An advantage of this method is computational speed. One disadvantage is not naming or identifying reads, but you can BLAST or use phylogenetic methods to identify the clustered reads, but these methods are time consuming and particularly when using BLAST and very large databases, may introduce errors.

Other types of clustering algorithms get around using BLAST by using "training sets" which are basically curated databases which can be specific to your research area (for specific example human microbiome or deep sea sediments) and will help to identify the reads based on where they were sampled or presumed taxonomic composition. The train.coords file is this training set. It may be computed from a prior analysis, in which case I'm not sure where it might be located.

Other clustering methods, such as those from RDP or greengenes, as implemented through QIIME, use a matrix or FASTA type file that you specifically create in a text editor that is your training set. In other words, you might have to create your own train.coords file, but I would think one would come with the tutorial, perhaps that's part of the tutorial?

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Josh Herr5.7k
1

and will give it a test run including reading over the documentation, so I will try to help out.

that's a hardcore attitude there Josh! :-)

ADD REPLYlink modified 6.9 years ago • written 6.9 years ago by Istvan Albert ♦♦ 82k

I've been meaning to give it a try, so here's a good excuse.

ADD REPLYlink written 6.9 years ago by Josh Herr5.7k
1
gravatar for Josh Herr
6.9 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

So I spent some time going through the tutorial scripts with some of my own data. The g3-from-scratch script is for clustering reads with no prior knowledge about what genes they may contain. The g3-from-training script requires that you provide your own training set, which is why you are having some issues because you need to provide that text file. Lastly, the g3-iterated script is a combination of the two, first it runs the "from scratch" clustering, then generates a .coord file as your "training set", then uses this training set to re-cluster your reads. ...and to answer your question, you don't have to run them in sequence.

I was actually surprised to see that the last update to these scripts was in early 2006, so sometimes it makes me a little afraid to invest time into something that hasn't been maintained in a while. I think there are other tools out there that have been developed in recent years which have similar and more robust functions. That being said, I am going to give GLIMMER a few more runs to really make my mind up on how I feel about it.

ADD COMMENTlink written 6.9 years ago by Josh Herr5.7k

wow thank you for everything! Really really appreciate you spending time to try this software just to 'decode' it for me =] So I guess I'll just have to run the g3-iterated script with my query file and the results will be generated? Am I right to assume that?

ADD REPLYlink written 6.9 years ago by rosarylimyt70

That's how it worked for me; it may take a while depending on the size of your dataset. Good luck!

ADD REPLYlink written 6.9 years ago by Josh Herr5.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1385 users visited in the last hour