Question: How To Extract Just The Coordinate Values From A Pdb File Converted To A Text File, In Java Only?
3
gravatar for Jeremiahloh
8.6 years ago by
Jeremiahloh30
Singapore
Jeremiahloh30 wrote:

ATOM 1 N ASN A 2 18.668 27.299 52.379 1.00 41.19 N

ATOM 2 CA ASN A 2 19.400 26.674 53.492 1.00 40.18 C

ATOM 3 C ASN A 2 19.710 27.737 54.550 1.00 37.56 C

ATOM 4 O ASN A 2 19.123 27.737 55.640 1.00 38.90 O

ATOM 5 N LEU A 3 20.637 28.606 54.184 1.00 34.40 N

Those in bold are the coordinates i need to extract and in the form of (x,y,z) down the list.

Would greatly appreciate your help.

From my research it seems that i can't directly extract columns but i have to do a parsing and a split token. Could someone justify this?

ADD COMMENTlink modified 5 months ago by RamRS20k • written 8.6 years ago by Jeremiahloh30
5
gravatar for Khader Shameer
8.6 years ago by
Manhattan, NY
Khader Shameer17k wrote:

Have you looked at BioJava for reading / parsing PDB files?

ADD COMMENTlink modified 5 months ago by RamRS20k • written 8.6 years ago by Khader Shameer17k
3
gravatar for Pierre Lindenbaum
8.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

I'm not a PDB guru, but if your file record is just the set of line you showed, then I would use the following trivial program:

(...)
Pattern delim=Pattern.compile("[\t]");//is it a tab or a space ?
String line;
while((line=bufferedReader.readLine())!=null)
  {
  String tokens[]=delim.split(line);
  double x= Double.parseDouble(tokens[6]);
  double y= Double.parseDouble(tokens[7]);
  double z= Double.parseDouble(tokens[8]);
  (...)
  }
(....)

if your PDB file is more complex than your snippet then, as said Khader, have a look at Biojava or at JavaCC .

ADD COMMENTlink modified 5 months ago by RamRS20k • written 8.6 years ago by Pierre Lindenbaum116k
4

Pierre, PDB files are generally parsed using column numbers. Please check ATOM records for a detailed description.

ADD REPLYlink modified 5 months ago by RamRS20k • written 8.6 years ago by Khader Shameer17k

I would like to add that PDB files might look simple and thus it is tempting to write your own little parser like that above. In reality there are many subtle issues in the parsing that are best left to mature libraries to handle. Thus I'd recommend the use of Biojava too, see the tutorial

ADD REPLYlink written 4.0 years ago by Jose Manuel Duarte280
2
gravatar for Egon Willighagen
8.1 years ago by
Maastricht
Egon Willighagen5.2k wrote:

Jmol and the CDK have PDB readers that allow you to do this too. A Groovy script (using Java classes) for the CDK could look like:

import org.openscience.cdk.interfaces.*;
import org.openscience.cdk.io.*;
import org.openscience.cdk.tools.manipulator.*;
import org.openscience.cdk.io.IChemObjectReader.Mode;
import org.openscience.cdk.*;
import java.io.File;
import java.util.zip.GZIPInputStream;

reader = new PDBReader(
  new GZIPInputStream(
    new URL("http://www.pdb.org/pdb/files/1CRN.pdb.gz").openStream()
  )
);
crambin = reader.read(new ChemFile());
for (container in ChemFileManipulator.getAllAtomContainers(crambin)) {
  for (atom in container.atoms()) {
    println atom.point3d;
  }
}
ADD COMMENTlink modified 5 months ago by RamRS20k • written 8.1 years ago by Egon Willighagen5.2k
2
gravatar for Abirami
7.9 years ago by
Abirami30
Abirami30 wrote:

How to extract the coordinates of an atom from a pdb file in c

char *substring(size_t start, size_t stop, const char *src, char *dst, size_t size)
{
    int count = stop - start;
    if ( count >= --size )
    {
        count = size;
    }

    sprintf(dst, "%.*s", count, src + start);
    return dst;
}

int main(void)
{
    const char filename[] = "cys_coord.txt";
    char x[10],y[10],z[10];
    int i,j;
    char buffer[500], *ptr;
    FILE *file = fopen(filename, "r");

    if ( file )
    {
        for ( i = 0; fgets(buffer, sizeof buffer, file); ++i )
        {
            printf("%s\n",buffer);  
            printf("x = %s\n", substring(30, 8, buffer, x, sizeof x));
            printf("y = %s\n", substring(38, 8, buffer, y, sizeof y));
            printf("z = %s\n", substring(46, 8, buffer, z, sizeof z)); 
        }
    }
    fclose(file);
}
ADD COMMENTlink modified 5 months ago by RamRS20k • written 7.9 years ago by Abirami30

Thanks for trying to help, but he was asking for a Java only solution.. (Although I wasn't the one who gave you the downvote)

ADD REPLYlink written 7.9 years ago by Tim320
0
gravatar for Jeremiahloh
8.6 years ago by
Jeremiahloh30
Singapore
Jeremiahloh30 wrote:

Hey there,

I came up with this but I think I am making a mess of all the information or methods and classes. Could anybody help me to straighten my thoughts? Pleasseee... and THank you!

import java.util.*;  
import java.io.*;
import java.util.regex.Pattern; import
java.io.StreamTokenizer;

public class CoorToks {

    public StringTokenizer(String token); //invalid method declaration
    public static void main(String[] args) throws IOException {
        BufferedReader inputStream = null; // scan input line by line
        PrintWriter outputStream = null;// output aligned the same way
        Pattern delim=Pattern.compile("/s");

        String token;
        StringTokenizer tokenizer = new StringTokenizer(token);

        try {
            inputStream = new BufferedReader(new FileReader("1APB.pdb.txt"));
            outputStream = new PrintWriter(new FileWriter("characteroutput.txt"));
            while(tokenizer.hasMoreTokens())
            {
                if (token.trim().startsWith("ATOM") && !token.trim().endsWith("H")) // I need to scan for the word "ATOM" before i start tokenizing. ends at H.
                {
                    // and i only need the 7th to 9th tokens of each line.
                    // should i use a pattern delimiter instead?
                    String tokens[]=delim.split(token);
                    double x= Double.parseDouble(tokens[7]);
                    double y= Double.parseDouble(tokens[8]);
                    double z= Double.parseDouble(tokens[9]);
                    outputStream.println(token);

                    //the compiler says it can't find variable tokens. which means i have to do a declaration of variables?
                    // how do i do that when there are so many tokens coming from the text file.
                }
            }
        }//end of try

        finally {
            while ((token = inputStream.readLine()) != null)
            {
                outputStream.println(token);
            }
            if (inputStream != null) {
                inputStream.close();
            }
            if (outputStream != null) {
                outputStream.close();
            }
        }
    }
}
ADD COMMENTlink modified 5 months ago by RamRS20k • written 8.6 years ago by Jeremiahloh30

This code will not extract 3D coordinate for hetero atoms, but maybe that's intentional?

ADD REPLYlink written 8.1 years ago by Egon Willighagen5.2k
String[] tokens = delim.split(token);

However, I would recommend either a) using a third-party library, as PDB files are tricky or b) splitting (as you say) on columns. Do this with substring + a copy of the PDB specification :)

ADD REPLYlink modified 5 months ago by RamRS20k • written 8.1 years ago by Gilleain30
0
gravatar for Jordeu
7.9 years ago by
Jordeu20
Barcelona
Jordeu20 wrote:

Do it object oriented!

You can use BioJava, check this two links:

If you are using java this is the best option.

ADD COMMENTlink modified 5 months ago by RamRS20k • written 7.9 years ago by Jordeu20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1657 users visited in the last hour