Question: Experiences With Pangea
3
gravatar for Daniel
9.1 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

Ref: http://www.nature.com/ismej/journal/v4/n7/full/ismej201016a.html

I'm new to the 454 pipeline racket and have been looking around for the best pipeline to use to analyze the 16s sequences we'll be getting in the next couple of months. RDP and Mothur have cropped up on occasion but PANGEA has several reasons why it is better than RDP (last couple of paragraphs of the paper above).

The highlights are that the pipeline is stored and processed on your own site and sequences don't require uploading and the whole pipeline will run from one command.

I was just wondering if anyone out there has had any success/pitfalls with it. I'm currently setting it up to try with some sanger data we have knocking around but some real life experiences would be helpful!

pipeline • 1.7k views
ADD COMMENTlink written 9.1 years ago by Daniel3.8k
1

the documentation for Pangea seems severely lacking that's quite worrisome

ADD REPLYlink written 9.1 years ago by Istvan Albert ♦♦ 82k

I agree. Also, the taxcollector database used for taxanomic descriptions is proving difficult to set up with dead-end weblinks and some faltering python (I'm only perl-native), which is hampering my ability to report back. Shall do when I crack it!

ADD REPLYlink written 9.1 years ago by Daniel3.8k

I wrote TaxCollector and someone in my lab wrote Pangea. What dead-end web-links?

ADD REPLYlink written 9.1 years ago by Science_Robot1.1k

In the setup file readme.md the command:

curl ftp://ftp.ncbi.nih.gov/pub/taxdump.tar.gz | gunzip | tar -xvf names.dmp nodes.dmp

directs to the wrong ftp address. It's actually found at:

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

I had to add lines to the remdup.py script to use seq and name as global variables

I also had to add import sys to remove_uncultured.py

I'm new to python so don't know if these are the most correct fixes, but these allowed me to run the scripts to generate the taxcollector database.

(all run on biolinux)

ADD REPLYlink modified 5 months ago by RamRS25k • written 9.1 years ago by Daniel3.8k

chose an answer as it keeps getting bumped by community and its annoying me. Very much like to commend taxcollector on its usefulness though.

ADD REPLYlink written 8.5 years ago by Daniel3.8k
0
gravatar for Daniel
9.1 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

(answered instead of commented for formatting reasons) Re:audyyy

In the setup file readme.md the command:

curl ftp://ftp.ncbi.nih.gov/pub/taxdump.tar.gz | gunzip | tar -xvf names.dmp nodes.dmp

directs to the wrong ftp address. It's actually found at:

ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz

To the file remdup.py I had to add lines to define seq and name as global variables

To the file remove_uncultured.py I had to add import sys

I'm new to python so don't know if these are the most correct fixes, but these allowed me to run the scripts to generate the taxcollector database.

Hope this helps.

(all run on biolinux)

ADD COMMENTlink modified 5 months ago by RamRS25k • written 9.1 years ago by Daniel3.8k

Thanks for pointing out the type-o. I just finished writing a Rakefile to fetch the required databases, filter them and create the TaxCollected version of RDP. https://github.com/audy/taxcollector

There is a difference between this version of TaxCollector and the one described in the paper. This one considers species and subspecies/strain separately. Before, different strains were considered different species which resulted in a lower number of sequences being classified to the Species level.

I hope this helps.

ADD REPLYlink written 9.1 years ago by Science_Robot1.1k

Thanks, Daniel.

ADD REPLYlink written 9.1 years ago by Science_Robot1.1k

Where did you get this copy of taxcollector? The one that's on the sourceforge works.

Also, I've been maintaining TaxCollector on GitHub (I don't touch the SourceForge version). Try https://github.com/audy/taxcollector/tree/1.0.0

2.0.0 Has a Rakefile which creates the taxcollector database if you just type Rake

ADD REPLYlink written 9.1 years ago by Science_Robot1.1k

Also, there's an already-made database and instructions here http://www.microgator.org/taxcollector/.

ADD REPLYlink written 9.1 years ago by Science_Robot1.1k

I apologise for the time it's taken for me to get back to this. Busy busy busy. The V2 taxcollector works perfect and I am a big fan. The rake is immensely useful. I cant remember where I got the original, sorry.

Re:PANGEA, Ive found a few case-sensitivity issues which you may want to pass on to whoever it concerns (Being a total Mac noob, I guess this isn't a problem on there?). Differences between the reference in the script and the filename. Look at clustertable.pl and 1.4_Barcode.

Hope this helps

ADD REPLYlink written 9.0 years ago by Daniel3.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1600 users visited in the last hour