Question: protein domain start-stop codons database
1
gravatar for cocchi.e89
7 months ago by
cocchi.e8950
cocchi.e8950 wrote:

I'm working on some variations (exome, human) and I got their domain feature with VEP.

e.g. of a result: Pfam_domain:PF01762&hmmpanther:PTHR11214&hmmpanther:PTHR11214:SF28&Low_complexity_(Seg):seg

I need to get back from this domain to its codons coordinates (e.g. PANTHER PTHR11214 start and stop codons(or also genetic position if possible)). Is there any DB to retrieve this information?

I found a similar post but it's made for manual retrieve each one, I need to automate the process.

Thanks a lot in advance for any help!

database vep codon domain • 258 views
ADD COMMENTlink modified 7 months ago by genomax78k • written 7 months ago by cocchi.e8950

Getting to the actual domain position is not trivial. You can get gene names and coordinates using Entrezdirect and Pfam id's:

$ esearch -db cdd -query "PF01762" | elink -target gene | esummary | xtract -pattern DocumentSummary -if ScientificName -equals "Homo sapiens" -element Id,Name,ScientificName,ChrAccVer,ChrStart,ChrStop

You will get something like (truncated for space):

56913   C1GALT1 Homo sapiens    NC_000007.14    NC_000007.14    NC_000007.14    NC_000007.14    NC_018918.2     NC_000007.14    NC_018918.2     NC_000007.13    AC_000068.1     AC_000139.1     NC_018918.2   7182546 7182546 7182546 7182546 7182546 7222243 7182546 7222243 7222177 7269405 7070738 7222243 7248650 7248650 7248650 7248650 7288281 7248650 7288281 7288281 7335443 7136783 7288281
10317   B3GALT5 Homo sapiens    NC_000021.9     NC_000021.9     NC_000021.9     NC_000021.9     NC_018932.2     NC_000021.9     NC_018932.2     NC_000021.8     AC_000153.1     NC_018932.2     39612939      39612939        39612939        39612939        39612939        40545617        39612939        40545617        40928368        26454203        40545617        39673136        39673136     39662888 39662888        40595560        39662888
ADD REPLYlink modified 7 months ago • written 7 months ago by genomax78k
1
gravatar for Emily_Ensembl
7 months ago by
Emily_Ensembl20k
EMBL-EBI
Emily_Ensembl20k wrote:

The ones that the VEP has found will be the ones in the Ensembl database, so the way to get back those ones will be using BioMart.

ADD COMMENTlink written 7 months ago by Emily_Ensembl20k
0
gravatar for genomax
7 months ago by
genomax78k
United States
genomax78k wrote:

You can use Entrezdirect to get this information.

$ esearch -db cdd -query "PF01762" | elink -target protein | esummary -format ft

Output should be something like (truncated). Region entry denotes the domain. AA position.:

>Feature ref|XP_021018358.1|
1       350     Protein
                        product N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 4
106     295     Region
                        region  Galactosyl_T
                        note    Galactosyltransferase
                        db_xref CDD:328824
1       350     CDS
                        product N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 4
                        protein_id      ref|XP_021018358.1|
                        db_xref GeneID:110294452

>Feature ref|XP_021015436.1|
1       325     Protein
                        product beta-1,3-galactosyltransferase 6
65      256     Region
                        region  Galactosyl_T
                        note    Galactosyltransferase
                        db_xref CDD:328824
1       325     CDS
                        product beta-1,3-galactosyltransferase 6
                        protein_id      ref|XP_021015436.1|
                        db_xref GeneID:110292468
ADD COMMENTlink modified 7 months ago • written 7 months ago by genomax78k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 779 users visited in the last hour