Question: Refining Data Stored In Sqlite - How To Join Several Contacts?
gravatar for Krab
9.2 years ago by
Krab40 wrote:

I'm storing contacts between different elements. I want to eliminate elements of certain type and store new contacts of elements which were interconnected by the eliminated element.

This is exact duplicate of my question at stackoverflow. I post it here as I hope here could be someone who encountered this problem before me. If you find it inappropriate to ask the same question at different places, I will delete it.

Problem background

Imagine this problem. You have a water molecule which is in contact with other molecules (if the contact is a hydrogen bond, there can be 4 other molecules around my water). Like in the following picture (A, B, C, D are some other atoms and dots mean the contact).

 A   B
 |   |
 H   H
  . .
  / \
 H   H
 .   .
 C   D

I have the information about all the dots and I need to eliminate the water in the center and create records describing contacts of A-C, A-D, A-B, B-C, B-D, and C-D.

Database structure

Currently, I have the following structure in the database:

Table atoms:

  • "id" integer PRIMARY KEY,
  • "amino" char(3) NOT NULL, (HOH for water or other value)
  • other columns identifying the atom

Table contacts:

  • "acceptor_id" integer NOT NULL, (the atom near to my hydrogen, here C or D)
  • "donor_id" integer NOT NULL, (here A or B)
  • "directness" char(1) NOT NULL, (this should be D for direct and W for water-mediated)
  • other columns about the contact, such as the distance

EDIT: How would look the data in the case depicted earlier.


1 | O  | HOH
2 | N  | ARG  <- atom A from image
3 | S  | CYS  <- B 
4 | O  | SER  <- C
5 | N  | ARG  <- D


1        4           D
1        5           D
2        1           D
3        1           D

From which I need to make


3        4           W            <- B-C
3        5           W            <- B-D
2        4           W            <- A-C
2        5           W            <- A-D
2        3           X            <- A-B    (These last two rows are escaping me,
4        5           X            <- C-D     there could be also row D-C, but not
                                             both C-D and D-C. A char 'X' could 
                                             be used to mark "no donor/acceptor")

Current solution (insufficient)

Now, I'm going through all the contacts which have donor.amino = "HOH". In this sample case, this would select contacts from C and D. For each of these selected contacts, I look up contacts having the same acceptor_id as is the donor_id in the currently selected contact. From this information, I create the new contact. At the end, I delete all contacts to or from HOH.

This way, I am obviously unable to create C-D and A-B contacts (the other 4 are OK).

If I try a similar approach - trying to find two contacts having the same donor_id, I end up with duplicate contacts (C-D and D-C).

Is there a simple way to retrieve all six contacts without duplicates?

I'm dreaming about some one page long SQL query which retrieves just these six wanted rows. :-)

However, any other ideas are welcome.

It is preferable to conserve information about who is donor (where possible), but not strictly necessary.

Big thanks to all of you who read this question to this point.

chemoinformatics python • 2.2k views
ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.2 years ago by Krab40

Sorry if I dont understand correctly, but is it the case that there will only be a relationship between A and B (or C and D) only if they are interconnected in at least two different ways? I am just trying to understand this more clearly.

ADD REPLYlink written 9.2 years ago by Pauln60

No, in this sample case, A and B are interconnected only once (through the only water here). But there are two different ways how to express this connection in the db: A is donor or B is donor. Since there is no real donor-acceptor relationship, both are equally valid, but only one should be kept in the database.

ADD REPLYlink written 9.2 years ago by Krab40
gravatar for Pierre Lindenbaum
9.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

Not sure I really understand what you want, but using mysql (I'm not sure you can do this with tinysql) the same table can be queried multiple time in the same request. for example:

  atoms as T1,
  contacts as C12,
  atoms as T2,
  contacts as C23,
  atoms as T3 
where and and and and!

does that help?

ADD COMMENTlink modified 10 months ago by RamRS22k • written 9.2 years ago by Pierre Lindenbaum121k

Yes, it helps, thank you. What I didn't think of was that you can do C12.acceptor_id = C23.donor_id (you're doing it through T2). My query is now longer, but the join of a table to self was the most important idea I was missing. Thanks.

ADD REPLYlink written 9.2 years ago by Krab40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1494 users visited in the last hour