Program to plot 60,000 16S rRNA dataset for phylogenetic assessment?
1
0
Entering edit mode
3.5 years ago

Hello, I'm looking to do a large-scale phylogenetic analysis. I plan to build a PCA plot with 60,000+ DNA sequences. I'd be doing a beta-diversity analysis with one sample comprised of 60,000 sequences while the other three are <200 sequences. I want all of the individual sequences to be included in the plot, rather than datapoints representing the complete samples.

I've been looking at Parallel-Meta and Qiime. Does anyone have any other suggestions? I'd be running it on a 16 GB RAM, 8 thread environment.

Thanks, Peter

gene • 596 views
ADD COMMENT
0
Entering edit mode

What is the size of the individual sequences? If the sequences are redundant then there is no point in using all of them as is.

ADD REPLY
0
Entering edit mode

The length would be ~1,000 bp. I was planning to get rid of redundancy so I'd reduce the sample size but my guess is the dataset would still 10,000-20,000.

ADD REPLY
0
Entering edit mode
3.5 years ago
h.mon 35k

Beta-diversity is a ratio between regional and local species diversity (abundance), and a PCA plot would depict the distances in some estimate of beta-diversity between the amostral units. Therefore, you can't include individual sequences in a PCA plot depicting beta-diversity, because individual sequences aren't diversity measures nor abundances.

Maybe you want a PCA biplot depicting the beta-diversity relation among samples, and also how each species (or other taxonomic unit you are using) relate to the beta-diversity differences among samples?

ADD COMMENT
0
Entering edit mode

Yeah I realized that I do not want to do a beta diversity analysis. I'm basically looking to analyze a small subset's phylogenetic relatedness amongst each other compared to a global population.

I essentially want to run a standard phylogenetic analysis (maximum likelihood method, most likely). But rather than depict the data in tree format, I want to depict the data in a PCA-like plot. I don't need the tree topology. Does that make sense?

I'm also a bit concerned about the size of the dataset.

ADD REPLY

Login before adding your answer.

Traffic: 2918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6