Question: How To Find The Pan Genome Of 30 Bacterial Strains
1
gravatar for Nari
8.0 years ago by
Nari880
United States
Nari880 wrote:

I have found out the core ortholog set (Core Genome) of 30 bacterial strains using NCBI Blast Package. But finding Pan Genome (Unique genes + accessory genes + Core genes) of same dataset of 30 organisms is becoming hectic. As it is not possible to align each genome with other, around overall 900 times. I can`t derive any other logical pattern so that I can determine the accessory genes without repeats. Please help. Thanks in advance.

genome bacteria • 5.0k views
ADD COMMENTlink modified 3.4 years ago by Biostar ♦♦ 20 • written 8.0 years ago by Nari880
5
gravatar for Istvan Albert
8.0 years ago by
Istvan Albert ♦♦ 85k
University Park, USA
Istvan Albert ♦♦ 85k wrote:

You probably would not need to align every gene against every other (if that's what you meant above). A simpler technique would be to add keep adding genes to a database if they are sufficiently different from the genes that are already there. As you perform the alignments you will need a simple program to tabulate which genes have been hit just once over the entire process (these are the unique), genes that were hit for every strain (core genes) the rest are the accessories.

You might also want to consult what the literature says:

The Pangenome Structure of Escherichia coli: Comparative Genomic Analysis of E. coli Commensal and Pathogenic Isolates

ADD COMMENTlink written 8.0 years ago by Istvan Albert ♦♦ 85k
1
gravatar for Asaf
8.0 years ago by
Asaf8.4k
Israel
Asaf8.4k wrote:

Maybe you can somehow use the precomputed CLUSTERS database of NCBI (http://www.ncbi.nlm.nih.gov/proteinclusters). It contains clusters of ortholog proteins.

ADD COMMENTlink written 8.0 years ago by Asaf8.4k
1
gravatar for Nari
5.1 years ago by
Nari880
United States
Nari880 wrote:

I was working on this since I asked this question and after this long work I, Myself built one tool which I named BPGA-Bacterial Pan Genome Analysis pipeline.

Along with core, accessory and unique genes, it also has many features like functional and pathway assignments and statistics and more.

It is available at my souceforge page

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Nari880
1

First I would like to commend you on pursuing it over the years, that is very admirable.

Now as a scientific software goes there are quite a few more essential steps.

Where is the source code where is the documentation, where is what the tool actually does? Where are the example inputs and outputs. All that is required for a proper scientific software.

I and many others would object to running an, executable especially a windows based one.

Put your code on Github instead of the awful sourceforge, open the sources and show what you can do. Most good companies hire off of github directly, I have myself received many offers on my github account alone.

ADD REPLYlink modified 12 months ago by RamRS30k • written 5.1 years ago by Istvan Albert ♦♦ 85k

Thanks for the appreciation. I am planning to compile Linux and Mac executables  too. Of course, your suggestion about GitHub is better.

ADD REPLYlink written 5.1 years ago by Nari880
1

I refer to source code, let people compile your code so that there is less danger of having a compromised binary.

I would strongly urge everyone to NEVER  download and run binaries.

I found that the way to tell who is a novice programmer is whether they are willing to show the source code. Those that are starting out often seem to think their code is somehow precious and everyone is out to steal it and sell as their own. Nothing could be further from the truth.

Let people use and understand what the software does, put whatever license you want on it, if you want to retain commercial rights so be it. Do everything you can to demonstrate that it is worth other people's time

ADD REPLYlink modified 12 months ago by RamRS30k • written 5.1 years ago by Istvan Albert ♦♦ 85k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2048 users visited in the last hour