Question: Extracting Multiple Sequences Files From Large Fasta Text Using Java
1
gravatar for J.Ashley
6.0 years ago by
J.Ashley10
J.Ashley10 wrote:

Hi I really need some help with my second problem . I have a large fasta file that contains over 300 sequences. I need to search each sequence in the fasta file that contains the following zinc fingers consensus sequence of C-x2-C-x15-C-x2-C or in other words... C-(then 2 letters of any type)-C-( then 15 letters of any type)-C-(2 letters of any type)-C

In the output file i need to print out the title line, the zinc finger and followed by the sequence itself.

Here is what I have so far

import java.io.*;
import java.util.*;
public class test {
public static void main(String[] args) throws IOException {

    String fileName = ""; 
    Scanner input = new ScannerSystem.in);    

    System.out.print ("Enter the name of the sequence file: ");
    fileName = input.nextLine();
    int count = 0;
    BufferedReader bf = null;        
    try {            
        bf = new BufferedReader(new FileReader(fileName));
        String line;
        while ((line = bf.readLine()) != null){
            // if is the title line, count as a record
            if (line.matches("^>.*"))count++;
        }                
    } catch (FileNotFoundException e) {
        System.out.println("File: " + fileName + " does not exist!");
    } finally {
        if (bf != null) {
            bf.close();
        }

After this i get completely confused I know to print out sequences within the file but i have no idea how to print out the type of sequences above. Any help is greatly appreciated

fasta java multiple • 3.8k views
ADD COMMENTlink modified 6.0 years ago by Alex Reynolds28k • written 6.0 years ago by J.Ashley10
2
gravatar for Pierre Lindenbaum
6.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

Here is my Satureday-Night-Fever solution.

public class Biostar68459
    {
    public static void main(String args[]) throws java.io.IOException
        {
        java.util.regex.Pattern pattern = java.util.regex.Pattern.compile("[Cc].{2}[Cc].{15}[Cc].{2}[Cc]");
        StringBuilder name=new StringBuilder();
        StringBuilder sequence=new StringBuilder();

        for(;;)
            {
            int c=System.in.read();
            switch(c)
                {
                case -1:
                case '>':
                    {
                    if(pattern.matcher(sequence).find())
                        {
                        System.out.print(">"+name);
                        for(int i=0;i< sequence.length();++i)
                            {
                            if(i%60==0) System.out.println();
                            System.out.print(sequence.charAt(i));
                            }
                        System.out.println();
                        }
                    if(c==-1) return;
                    name.setLength(0);
                    sequence.setLength(0);
                    while((c=System.in.read())!=-1 && c!='\n') name.append((char)c);
                    break;
                    }
                case '\n':
                case ' ':
                case '\r':  break;
                default: sequence.append((char)c);break;
                }
            }
        }
    }

.

$ curl -s  "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=475808216&rettype=fasta" | java Biostar68459  | head
>gi|475808216|ref|NM_001277403.1| Homo sapiens zinc finger protein 730 (ZNF730), mRNA
AATCAGGCCCGCAGCTGGAGCAGACAGGGCGGCTTCCGGGATTTGGCGCGGCCTTTGTTT
CTCGCTGCCGCCGAAGCTCCAATTTTCGTCTGTCTGCTTTGTGTCCTCTGCACGTAGAAG
CCCAGCCTGTGTGGCCCTGCGACCTGCGGGTATTGGGAGATCCACAGCTAAGACGCCAGG
GCCCCCTGGAAGCCTAGAAATGGGAGCGTTGACATTTAGAGATGTGGCCATAGAATTCTC
TCTGGAGGAGTGGCAATGTCTGGACACCGAACAACAGAATTTATATAGAAATGTAATGTT
AGATAACTACAGAAACCTGGTCTTCCTGGGTATTGCTGTCTCAAAGCCAGACCTGATCAC
CTGTCTGGAGCAAGAAAAAGAGCCTTGGAATTTGAAGACACATGATATGGTAGCCAAACC
CCCAGTTATATGTTCTCATATTGCCCAAGACCTTTGGCCAGAGCAAGGCATAAAAGATTA
TTTCCAAGAAGTCATACTGAGACAATATAAAAAATGTAGACATGAGAATTTACTGTTAAG
ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Pierre Lindenbaum119k
1

You're a generous person! Maybe I'm too pessimistic, but this question really sounds like a homework problem, and the "what I have so far" really seems like skeleton code from a problem statement.

I would've just given vague pointers to consider using regular expressions, since even that tidbit wasn't present in the question.

But maybe I'm wrong...

ADD REPLYlink written 6.0 years ago by matted7.0k

You're right. But he provided a source code as if he really tried to solve the problem and ... I was looking for something funny to do before switching off my laptop :-)

ADD REPLYlink written 6.0 years ago by Pierre Lindenbaum119k

Aghhh I see so you use the compile method!. Thank you so much for your help. I will try this out..test it and see what happens. Again thanks, sometimes it just takes an example to get you going!!

ADD REPLYlink written 6.0 years ago by J.Ashley10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 787 users visited in the last hour