Question: how to relate protein id(protein sequence) to genbank file seuqnce
0
gravatar for kws15
3.4 years ago by
kws1540
kws1540 wrote:

Hi everyone,

I am completely new to bioinformatics and I'm working on a project about tomato. So I have used some package to identify the orthologs of S.pennellii to transcription factors of S.lycopersicum. I did that by aligning the S.lycopersicum's transcription factor protein sequences against all the protein sequences (fasta file on ncbi) of S.pennellii.

Now I basically have something like this

Solyc07g053610.2.1 100%,Sopen07g027560.100%

What I want to do about these protein ids is that I want to relate them to genbank file (nucleotide sequences), does anyone have any idea how can I do this? These protein id may not be compatible with the genbank files as they having different naming system? Thank you very much

genbank protein id • 1.2k views
ADD COMMENTlink modified 10 months ago by RamRS22k • written 3.4 years ago by kws1540
0
gravatar for piet
3.4 years ago by
piet1.7k
planet earth
piet1.7k wrote:

It seems that these protein identifiers have only been used internally by ITAG (international tomato annotation group) but never submitted to Genbank.

There is currently only one full genome of tomato in Genbank. It has seen some upgrades in recent years, but with every upgrade the chromosomal coordinates are shifted.The latest assembly from ITAG is available as a NCBI refsequence. This refsequence has been automatically reannotated by NCBI, but the original ITAG annotation can be downloaded from www.solgenomics.net.

wget ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG2.4_release/ITAG2.4_gene_models.gff3

The GFF file can be grepped for the position of protein Solyc07g053610 in the chromosomal DNA sequence:

awk '$3~/gene/ && $9~/Solyc07g053610/' ITAG2.4_gene_models.gff3 | sed 's/SL2.50ch07/NC_015444.2/'

NC_015444.2     ITAG_eugene     gene    62033451        62049779        .       +       .       ID=gene:Solyc07g053610.2;Name=Solyc07g053610.2;Alias=Solyc07g053610;from_BOGAS=1;length=16329

Table on mapping between chromosome numbers and NCBI refsequence accessions here

ADD COMMENTlink modified 10 months ago by RamRS22k • written 3.4 years ago by piet1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1028 users visited in the last hour