Question: How to call this file format?
0
gravatar for Golf
11 months ago by
Golf0
Golf0 wrote:

Hello everyone,

I have some basic question about file format.

I would like to know how to call this file format, for example, the files from Transfac, ENZYME nomenclature database, and AAindex files

ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G
XX
//

or

ID   6.6.1.2
DE   Cobaltochelatase.
AN   Hydrogenobyrinic acid a,c-diamide cobaltochelatase.
CA   ATP + hydrogenobyrinic acid a,c-diamide + Co(2+) + H(2)O = ADP +
CA   phosphate + cob(II)yrinic acid a,c-diamide + H(2).
CC   -!- This enzyme, which forms part of the aerobic cobalamin biosynthesis
CC       pathway, is a type I chelatase, being heterotrimeric and ATP-
CC       dependent.
CC   -!- It comprises two components, one of which corresponds to CobN and the
CC       other is composed of two polypeptides, specified by cobS and cobT in
CC       Pseudomonas denitrificans, and named CobST.
CC   -!- Hydrogenobyrinic acid is a very poor substrate.
CC   -!- ATP can be replaced by dATP or CTP but the reaction proceeds more
CC       slowly.
CC   -!- CobN exhibits a high affinity for hydrogenobyrinic acid a,c-diamide.
CC   -!- The oligomeric protein CobST possesses at least one sulfhydryl group
CC       that is essential for ATP-binding.
CC   -!- Once the Co(2+) is inserted, the next step in the pathway ensures
CC       that the cobalt is ligated securely by reducing Co(II) to Co(I); this
CC       step is carried out by EC 1.16.8.1.
DR   Q9HZQ3, COBN_PSEAE ;  P29929, COBN_PSEDE ;  P29933, COBS_PSEDE ;
DR   P29934, COBT_PSEDE ;
//

or

H ANDN920101
D alpha-CH chemical shifts (Andersen et al., 1992)
R PMID:1575719
A Andersen, N.H., Cao, B. and Chen, C.
T Peptide/protein structure analysis using the chemical shift index method: 
  upfield alpha-CH values reveal dynamic helices and aL sites
J Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992)
C BUNA790102    0.949
I    A/L     R/K     N/M     D/F     C/P     Q/S     E/T     G/W     H/Y     I/V
    4.35    4.38    4.75    4.76    4.65    4.37    4.29    3.97    4.63    3.95
    4.17    4.36    4.52    4.66    4.44    4.50    4.35    4.70    4.60    3.95
//
H ARGP820101
D Hydrophobicity index (Argos et al., 1982)
R PMID:7151796
A Argos, P., Rao, J.K.M. and Hargrave, P.A.
T Structural prediction of membrane-bound proteins
J Eur. J. Biochem. 128, 565-575 (1982)
C JOND750101    1.000  SIMZ760101    0.967  GOLD730101    0.936
  TAKK010101    0.906  MEEJ810101    0.891  ROSM880104    0.872
  CIDH920105    0.867  LEVM760106    0.865  CIDH920102    0.862
  MEEJ800102    0.855  MEEJ810102    0.853  ZHOH040101    0.841
  CIDH920103    0.827  PLIV810101    0.820  CIDH920104    0.819
  LEVM760107    0.806  NOZY710101    0.800  GUYH850103   -0.808
  PARJ860101   -0.835  WOLS870101   -0.838  BULH740101   -0.854
I    A/L     R/K     N/M     D/F     C/P     Q/S     E/T     G/W     H/Y     I/V
    0.61    0.60    0.06    0.46    1.07      0.    0.47    0.07    0.61    2.22
    1.53    1.15    1.18    2.02    1.95    0.05    0.05    2.65    1.88    1.32
//

Thank you for your kind response!

Golf

file format • 262 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by Golf0
1

Strictly speaking, the file format of both these is "transfac database format" - that is, they are not a standard format, but rather one that is only used by transfac (as far as I know).

However, the first format is basically a Position Count Matrix, which is an normalised version of the standard Position weight matrix (PWM) used by many motif finding and scanning tools. What makes it different from these standard format is the field names down the left hand side.

ADD REPLYlink written 11 months ago by i.sudbery6.6k

@i.sudbery Thank you for your answer!

ADD REPLYlink written 11 months ago by Golf0
3
gravatar for Jean-Karim Heriche
11 months ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

First one is the TRANSFAC matrix file format while the second one looks like the UniProt file format.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Jean-Karim Heriche21k

Thank you for your answer!

ADD REPLYlink written 11 months ago by Golf0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1052 users visited in the last hour