Entering edit mode
5.2 years ago
Golf
•
0
Hello everyone,
I have some basic question about file format.
I would like to know how to call this file format, for example, the files from Transfac, ENZYME nomenclature database, and AAindex files
ID any_old_name_for_motif_1
BF species_name_for_motif_1
P0 A C G T
01 1 2 2 0 S
02 2 1 2 0 R
03 3 0 1 1 A
04 0 5 0 0 C
05 5 0 0 0 A
06 0 0 4 1 G
07 0 1 4 0 G
08 0 0 0 5 T
09 0 0 5 0 G
10 0 1 2 2 K
11 0 2 0 3 Y
12 1 0 3 1 G
XX
//
or
ID 6.6.1.2
DE Cobaltochelatase.
AN Hydrogenobyrinic acid a,c-diamide cobaltochelatase.
CA ATP + hydrogenobyrinic acid a,c-diamide + Co(2+) + H(2)O = ADP +
CA phosphate + cob(II)yrinic acid a,c-diamide + H(2).
CC -!- This enzyme, which forms part of the aerobic cobalamin biosynthesis
CC pathway, is a type I chelatase, being heterotrimeric and ATP-
CC dependent.
CC -!- It comprises two components, one of which corresponds to CobN and the
CC other is composed of two polypeptides, specified by cobS and cobT in
CC Pseudomonas denitrificans, and named CobST.
CC -!- Hydrogenobyrinic acid is a very poor substrate.
CC -!- ATP can be replaced by dATP or CTP but the reaction proceeds more
CC slowly.
CC -!- CobN exhibits a high affinity for hydrogenobyrinic acid a,c-diamide.
CC -!- The oligomeric protein CobST possesses at least one sulfhydryl group
CC that is essential for ATP-binding.
CC -!- Once the Co(2+) is inserted, the next step in the pathway ensures
CC that the cobalt is ligated securely by reducing Co(II) to Co(I); this
CC step is carried out by EC 1.16.8.1.
DR Q9HZQ3, COBN_PSEAE ; P29929, COBN_PSEDE ; P29933, COBS_PSEDE ;
DR P29934, COBT_PSEDE ;
//
or
H ANDN920101
D alpha-CH chemical shifts (Andersen et al., 1992)
R PMID:1575719
A Andersen, N.H., Cao, B. and Chen, C.
T Peptide/protein structure analysis using the chemical shift index method:
upfield alpha-CH values reveal dynamic helices and aL sites
J Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992)
C BUNA790102 0.949
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
4.35 4.38 4.75 4.76 4.65 4.37 4.29 3.97 4.63 3.95
4.17 4.36 4.52 4.66 4.44 4.50 4.35 4.70 4.60 3.95
//
H ARGP820101
D Hydrophobicity index (Argos et al., 1982)
R PMID:7151796
A Argos, P., Rao, J.K.M. and Hargrave, P.A.
T Structural prediction of membrane-bound proteins
J Eur. J. Biochem. 128, 565-575 (1982)
C JOND750101 1.000 SIMZ760101 0.967 GOLD730101 0.936
TAKK010101 0.906 MEEJ810101 0.891 ROSM880104 0.872
CIDH920105 0.867 LEVM760106 0.865 CIDH920102 0.862
MEEJ800102 0.855 MEEJ810102 0.853 ZHOH040101 0.841
CIDH920103 0.827 PLIV810101 0.820 CIDH920104 0.819
LEVM760107 0.806 NOZY710101 0.800 GUYH850103 -0.808
PARJ860101 -0.835 WOLS870101 -0.838 BULH740101 -0.854
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
0.61 0.60 0.06 0.46 1.07 0. 0.47 0.07 0.61 2.22
1.53 1.15 1.18 2.02 1.95 0.05 0.05 2.65 1.88 1.32
//
Thank you for your kind response!
Golf
Strictly speaking, the file format of both these is "transfac database format" - that is, they are not a standard format, but rather one that is only used by transfac (as far as I know).
However, the first format is basically a Position Count Matrix, which is an normalised version of the standard Position weight matrix (PWM) used by many motif finding and scanning tools. What makes it different from these standard format is the field names down the left hand side.
@i.sudbery Thank you for your answer!