Question: protein domain start-stop codons database
1
gravatar for cocchi.e89
4 months ago by
cocchi.e8930
cocchi.e8930 wrote:

I'm working on some variations (exome, human) and I got their domain feature with VEP.

e.g. of a result: Pfam_domain:PF01762&hmmpanther:PTHR11214&hmmpanther:PTHR11214:SF28&Low_complexity_(Seg):seg

I need to get back from this domain to its codons coordinates (e.g. PANTHER PTHR11214 start and stop codons(or also genetic position if possible)). Is there any DB to retrieve this information?

I found a similar post but it's made for manual retrieve each one, I need to automate the process.

Thanks a lot in advance for any help!

database vep codon domain • 208 views
ADD COMMENTlink modified 4 months ago by genomax73k • written 4 months ago by cocchi.e8930

Getting to the actual domain position is not trivial. You can get gene names and coordinates using Entrezdirect and Pfam id's:

$ esearch -db cdd -query "PF01762" | elink -target gene | esummary | xtract -pattern DocumentSummary -if ScientificName -equals "Homo sapiens" -element Id,Name,ScientificName,ChrAccVer,ChrStart,ChrStop

You will get something like (truncated for space):

56913   C1GALT1 Homo sapiens    NC_000007.14    NC_000007.14    NC_000007.14    NC_000007.14    NC_018918.2     NC_000007.14    NC_018918.2     NC_000007.13    AC_000068.1     AC_000139.1     NC_018918.2   7182546 7182546 7182546 7182546 7182546 7222243 7182546 7222243 7222177 7269405 7070738 7222243 7248650 7248650 7248650 7248650 7288281 7248650 7288281 7288281 7335443 7136783 7288281
10317   B3GALT5 Homo sapiens    NC_000021.9     NC_000021.9     NC_000021.9     NC_000021.9     NC_018932.2     NC_000021.9     NC_018932.2     NC_000021.8     AC_000153.1     NC_018932.2     39612939      39612939        39612939        39612939        39612939        40545617        39612939        40545617        40928368        26454203        40545617        39673136        39673136     39662888 39662888        40595560        39662888
ADD REPLYlink modified 4 months ago • written 4 months ago by genomax73k
1
gravatar for Emily_Ensembl
4 months ago by
Emily_Ensembl19k
EMBL-EBI
Emily_Ensembl19k wrote:

The ones that the VEP has found will be the ones in the Ensembl database, so the way to get back those ones will be using BioMart.

ADD COMMENTlink written 4 months ago by Emily_Ensembl19k
0
gravatar for genomax
4 months ago by
genomax73k
United States
genomax73k wrote:

You can use Entrezdirect to get this information.

$ esearch -db cdd -query "PF01762" | elink -target protein | esummary -format ft

Output should be something like (truncated). Region entry denotes the domain. AA position.:

>Feature ref|XP_021018358.1|
1       350     Protein
                        product N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 4
106     295     Region
                        region  Galactosyl_T
                        note    Galactosyltransferase
                        db_xref CDD:328824
1       350     CDS
                        product N-acetyllactosaminide beta-1,3-N-acetylglucosaminyltransferase 4
                        protein_id      ref|XP_021018358.1|
                        db_xref GeneID:110294452

>Feature ref|XP_021015436.1|
1       325     Protein
                        product beta-1,3-galactosyltransferase 6
65      256     Region
                        region  Galactosyl_T
                        note    Galactosyltransferase
                        db_xref CDD:328824
1       325     CDS
                        product beta-1,3-galactosyltransferase 6
                        protein_id      ref|XP_021015436.1|
                        db_xref GeneID:110292468
ADD COMMENTlink modified 4 months ago • written 4 months ago by genomax73k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1166 users visited in the last hour