Identify common genes and their respective DNA sequences
1
0
Entering edit mode
9.0 years ago
sam • 0

Hello,

​I'm trying to use these files to find the common genes between the multiple genomes (belong to Bacteria) and extract their genomic sequences. Does anyone have any experience on how to do that based on the data provided by PATRIC? I'm having difficulty identifying the common genes between those set of different bacterial genomes and their corresponding DNA sequences because I cannot identify a common id between them. Any help would be greatly appreciated.

genome-sequencing • 2.1k views
ADD COMMENT
0
Entering edit mode

May be hyper link is missing ?

ADD REPLY
0
Entering edit mode

Hi Sam. What format is the data in to begin with? What info do you need in the output?

ADD REPLY
1
Entering edit mode
8.8 years ago
cyril-cros ▴ 950

http://www.pantherdb.org/ is a useful tool to find homologous genes, but PATRIC's IDs for drug resistance genes are not ideal... I advise you to select ~10 bacterial genomes (most commonly encountered in clinical infections) for starters, because I don't know if shared bacterial resistance genes are common. For others looking for a link, https://www.patricbrc.org/portal/portal/patric/AntibioticResistance leads you to an interesting page. Downloads are limited to 20000 at a time, there are 800000 entries for bacterial resistance.

Note that you have a MAP IDs to ... button on the PATRIC specialty genes browser!!!

You can use it to get useful IDs that work with Pantherdb or HOGENOM, a database of bacterial homologous genes.I would also try to cluster the genes in some clever way (homologous genes, genes with same effect / GO process).

ADD COMMENT
0
Entering edit mode

For example, I search for Tuberculosis resistance genes: https://www.patricbrc.org/portal/portal/patric/IDMapping?cType=taxon&cId=131567&dm=result&pk=6260159049994524205

I use Map IDs to 'RefSeq > GeneID'. I look up Rv3065 (chosen randomly). PantherDb leads me to: http://www.pantherdb.org/genes/gene.do?acc=MYCTU|Gene=Rv3065|UniProtKB=P9WGF1#orthologs

Each protein-coding gene in PANTHER is also represented by a "representative" protein sequence. This is the "canonical" sequence whenever one is available, and substantial effort was given to select the best representative. The protein sequences are used to estimate phylogenetic trees.

Use that one if you can find a way to access it

ADD REPLY

Login before adding your answer.

Traffic: 2571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6