PAN and CORE genome analysis
2
1
Entering edit mode
7.8 years ago

Hi,

I have an OTU biom file (obtained from Closed reference QIIME 1.8.0 v) contains 65 samples, I am trying to do analysis for PAN/CORE genome.

I have filtered out the taxonomy from the abundance file (with particular threshold, lets say 60 %), now I have an taxonomy column only in file from all the 65 samples (with threshold 60%), Is there is a way where I can do the functional annotation for it?

Any server/ software is there which can do that? or which do pan (complete) /core (shared) analysis

Any suggestions ?

Best!
Shashank

genome sequencing next-gen Assembly • 4.6k views
0
Entering edit mode

What you mean with PAN and CORE genome analysis is that you want to find the complete genes/proteins and shared genes/proteins among your OTUs, right? I'm not familiar with biom file, but does it contain the genome sequence(s) of the organisms that you're analyzing? You will need the gene sequences of the whole genomes (or protein sequences of the whole proteomes) to get the pan-genome and core-genome.

0
Entering edit mode

Yes, Complete gene and Shared gene.

Biom file don't have the genomic sequence. It looks like-

339039 Bacteria;Proteobacteria;Alphaproteobacteria;Rhodospirillales;unclassified_Rhodospirillales
199390 Bacteria;Chloroflexi;Anaerolineae;Caldilineae;Caldilineales;Caldilineacea;unclassified_Caldilineacea
370251 Bacteria;Proteobacteria;Gammaproteobacteria;unclassified_Gammaproteobacteria


Where number represents the OTU ID, followed by taxonomy. OTU ID represents the particular sequence associated with the particular taxonomy.

If I incorporated the gene sequence by using the OTU ID corresponding to the taxonomy, Now I have a gene sequence file, than how can I use it for further analysis ?

Cheers!

0
Entering edit mode

"OTU ID represents the particular sequence associated with the particular taxonomy." What particular sequence is it? From one gene only? You can't do pan- and core-genome analysis using only one gene from each species/OTU. You need the genome (or better, the proteome) from each OTU, find orthologous gene/protein among the OTUs (I used OrthoMCL for my bacteria), and there you have the core-proteome. The pan-proteome would be the core plus any other proteins of each OTU.

0
Entering edit mode
6.8 years ago

Dear you can also try DAVID functional annotation database. For more detail plz see this link https://david.ncifcrf.gov/tools.jsp

0
Entering edit mode
6.8 years ago
5heikki 11k

If you used GreenGenes 13_5 as a reference database, you can associate the OTUs with protein content of nearest sequenced reference genomes with PICRUST. However, in my opinion this approach is pretty much worthless. The 16S sequence of your OTU representative being relatively similar (60% is not even remotely similar, threshold should be like 99.9% for this stuff) to the 16S sequence of some reference genome does not mean that that the protein contents of these two genomes are even remotely similar..