Question: PAN and CORE genome analysis
1
gravatar for bioinformaticssrm2011
5.8 years ago by
India
bioinformaticssrm2011100 wrote:

Hi,
I have an OTU biom file (obtained from Closed reference QIIME 1.8.0 v) contains 65 samples, I am trying to do analysis for PAN/CORE genome.
I have filtered out the taxonomy from the abundance file (with particular threshold, lets say 60 %), now i have an taxonomy column only in file from all the 65 samples (with threshold 60%), Is there is a way where i can do the functional annotation for it ? 

Any server/ software is there which can do that ? or which do pan (complete) /core (shared) analysis

Any suggestions ?

Best !
Shashank

ADD COMMENTlink modified 4.8 years ago by 5heikki9.1k • written 5.8 years ago by bioinformaticssrm2011100

What you mean with PAN and CORE genome analysis is that you want to find the complete genes/proteins and shared genes/proteins among your OTUs, right? I'm not familiar with biom file, but does it contain the genome sequence(s) of the organisms that you're analyzing? You will need the gene sequences of the whole genomes (or protein sequences of the whole proteomes) to get the pan-genome and core-genome.

ADD REPLYlink written 5.8 years ago by sentausa640

Yes, Complete gene and Shared gene.

Biom file don't have the genomic sequence. It looks like-

339039 Bacteria;Proteobacteria;Alphaproteobacteria;Rhodospirillales;unclassified_Rhodospirillales

199390 Bacteria;Chloroflexi;Anaerolineae;Caldilineae;Caldilineales;Caldilineacea;unclassified_Caldilineacea

370251 Bacteria;Proteobacteria;Gammaproteobacteria;unclassified_Gammaproteobacteria

 

Where number represents the OTU ID, followed by taxonomy. OTU ID represents the particular sequence associated with the particular taxonomy.

 

If i incorporated the gene sequence by using the OTU ID corresponding to the taxonomy, Now i have a gene sequence file, than how can i use it for further analysis ?

Cheers!

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by bioinformaticssrm2011100

"OTU ID represents the particular sequence associated with the particular taxonomy." What particular sequence is it? From one gene only? You can't do pan- and core-genome analysis using only one gene from each species/OTU. You need the genome (or better, the proteome) from each OTU, find orthologous gene/protein among the OTUs (I used OrthoMCL for my bacteria), and there you have the core-proteome. The pan-proteome would be the core plus any other proteins of each OTU.

ADD REPLYlink written 5.8 years ago by sentausa640
0
gravatar for archana.bioinfo87
4.8 years ago by
archana.bioinfo87180 wrote:

Dear you can also try DAVID functional annotation database. For more detail plz see this link https://david.ncifcrf.gov/tools.jsp

Hopefully this may help you.

ADD COMMENTlink written 4.8 years ago by archana.bioinfo87180
0
gravatar for 5heikki
4.8 years ago by
5heikki9.1k
Finland
5heikki9.1k wrote:

If you used GreenGenes 13_5 as a reference database, you can associate the OTUs with protein content of nearest sequenced reference genomes with PICRUST. However, in my opinion this approach is pretty much worthless. The 16S sequence of your OTU representative being relatively similar (60% is not even remotely similar, threshold should be like 99.9% for this stuff) to the 16S sequence of some reference genome does not mean that that the protein contents of these two genomes are even remotely similar..

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by 5heikki9.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2170 users visited in the last hour
_