Question

How to call this file format?

0

Entering edit mode

5.2 years ago

Golf • 0

Hello everyone,

I have some basic question about file format.

I would like to know how to call this file format, for example, the files from Transfac, ENZYME nomenclature database, and AAindex files

ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0      A      C      G      T
01      1      2      2      0      S
02      2      1      2      0      R
03      3      0      1      1      A
04      0      5      0      0      C
05      5      0      0      0      A
06      0      0      4      1      G
07      0      1      4      0      G
08      0      0      0      5      T
09      0      0      5      0      G
10      0      1      2      2      K
11      0      2      0      3      Y
12      1      0      3      1      G
XX
//

or

ID   6.6.1.2
DE   Cobaltochelatase.
AN   Hydrogenobyrinic acid a,c-diamide cobaltochelatase.
CA   ATP + hydrogenobyrinic acid a,c-diamide + Co(2+) + H(2)O = ADP +
CA   phosphate + cob(II)yrinic acid a,c-diamide + H(2).
CC   -!- This enzyme, which forms part of the aerobic cobalamin biosynthesis
CC       pathway, is a type I chelatase, being heterotrimeric and ATP-
CC       dependent.
CC   -!- It comprises two components, one of which corresponds to CobN and the
CC       other is composed of two polypeptides, specified by cobS and cobT in
CC       Pseudomonas denitrificans, and named CobST.
CC   -!- Hydrogenobyrinic acid is a very poor substrate.
CC   -!- ATP can be replaced by dATP or CTP but the reaction proceeds more
CC       slowly.
CC   -!- CobN exhibits a high affinity for hydrogenobyrinic acid a,c-diamide.
CC   -!- The oligomeric protein CobST possesses at least one sulfhydryl group
CC       that is essential for ATP-binding.
CC   -!- Once the Co(2+) is inserted, the next step in the pathway ensures
CC       that the cobalt is ligated securely by reducing Co(II) to Co(I); this
CC       step is carried out by EC 1.16.8.1.
DR   Q9HZQ3, COBN_PSEAE ;  P29929, COBN_PSEDE ;  P29933, COBS_PSEDE ;
DR   P29934, COBT_PSEDE ;
//

or

H ANDN920101
D alpha-CH chemical shifts (Andersen et al., 1992)
R PMID:1575719
A Andersen, N.H., Cao, B. and Chen, C.
T Peptide/protein structure analysis using the chemical shift index method: 
  upfield alpha-CH values reveal dynamic helices and aL sites
J Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992)
C BUNA790102    0.949
I    A/L     R/K     N/M     D/F     C/P     Q/S     E/T     G/W     H/Y     I/V
    4.35    4.38    4.75    4.76    4.65    4.37    4.29    3.97    4.63    3.95
    4.17    4.36    4.52    4.66    4.44    4.50    4.35    4.70    4.60    3.95
//
H ARGP820101
D Hydrophobicity index (Argos et al., 1982)
R PMID:7151796
A Argos, P., Rao, J.K.M. and Hargrave, P.A.
T Structural prediction of membrane-bound proteins
J Eur. J. Biochem. 128, 565-575 (1982)
C JOND750101    1.000  SIMZ760101    0.967  GOLD730101    0.936
  TAKK010101    0.906  MEEJ810101    0.891  ROSM880104    0.872
  CIDH920105    0.867  LEVM760106    0.865  CIDH920102    0.862
  MEEJ800102    0.855  MEEJ810102    0.853  ZHOH040101    0.841
  CIDH920103    0.827  PLIV810101    0.820  CIDH920104    0.819
  LEVM760107    0.806  NOZY710101    0.800  GUYH850103   -0.808
  PARJ860101   -0.835  WOLS870101   -0.838  BULH740101   -0.854
I    A/L     R/K     N/M     D/F     C/P     Q/S     E/T     G/W     H/Y     I/V
    0.61    0.60    0.06    0.46    1.07      0.    0.47    0.07    0.61    2.22
    1.53    1.15    1.18    2.02    1.95    0.05    0.05    2.65    1.88    1.32
//

Thank you for your kind response!

Golf

ENZYME AAindex TRANSFAC • 1.2k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 5.2 years ago by Golf • 0

1

Entering edit mode

Strictly speaking, the file format of both these is "transfac database format" - that is, they are not a standard format, but rather one that is only used by transfac (as far as I know).

However, the first format is basically a Position Count Matrix, which is an normalised version of the standard Position weight matrix (PWM) used by many motif finding and scanning tools. What makes it different from these standard format is the field names down the left hand side.

ADD REPLY • link 5.2 years ago by i.sudbery 19k

0

Entering edit mode

@i.sudbery Thank you for your answer!

ADD REPLY • link 5.2 years ago by Golf • 0

score 3 · Accepted Answer · 2019-01-29

3

Entering edit mode

5.2 years ago

Jean-Karim Heriche 27k

First one is the TRANSFAC matrix file format while the second one looks like the UniProt file format.