Question

Use Query List To Extra Sequeces

0

Entering edit mode

12.4 years ago

redspider19800915 ▴ 40

Question:

A database containing sequences as follows:

>leaf_1
AAGACCATTCGAGCTTATCTCTTC
>leaf_2
ATGGAGAAGGAAATGAAGAGCAGT
>leaf_3
TGGCTGTAAGTCATACCTGTCA
>leaf_4
CGCGGAGTAGATCAGTTTGGTA
>leaf_5
AGTAACGGCTTTACAAGAATCAAA
......

Now I have a query file (inquiry.txt), which looks like:

>leaf_2
>leaf_4
>leaf_5

Need an output file (result.txt) looks like:

>leaf_2
ATGGAGAAGGAAATGAAGAGCAGT
>leaf_4
CGCGGAGTAGATCAGTTTGGTA
>leaf_5
AGTAACGGCTTTACAAGAATCAAA

Could anyone help with this question? Many thanks.

data list • 2.2k views

ADD COMMENT • link updated 12.4 years ago by k.nirmalraman ★ 1.1k • written 12.4 years ago by redspider19800915 ▴ 40

1

Entering edit mode

This is a pretty common question: Extracting Sequence From A 3Gb Fasta File?

ADD REPLY • link 12.4 years ago by David W 4.9k

score 1 · Answer 1 · 2013-05-24

Try?

$ perl below-script.pl all-sequences.txt inquiry.txt

#!/usr/bin/perl

open (INPUT, $ARGV[0]) or die $1;
open (QUERY, $ARGV[1]) or die $1;
open (OUTPUT, ">result.txt");

chomp (my @array=<QUERY>);

while (<INPUT>) {
    foreach my $temp (@array){
    if ($_ =~ $temp) {
    $nextline = <INPUT>;
    print OUTPUT "$_$nextline";
    }
    }
}

close (OUTPUT);
close (QUERY);
close (INPUT);

score 0 · Answer 2 · 2013-05-24

You may also try the following Perl script... and this works for fasta format input files!

  use strict;
  use warnings;

  my @genes;
  open my $list, '<file2.list';
  while (my $line = <$list>) {
      push (@genes, $1) if $line =~ /[^>]+>([^|]+)/;
  }
  my $input;
  close $list;
  {
       local $/ = undef;
       open my $fasta, '<file1.fasta';
       $input = <$fasta>;
       close $fasta;
  }
 my @lines = split(/>/,$input);
 foreach my $l (@lines) {
      foreach my $reg (@genes) {
              print ">$l" if $l =~ /$reg\|/;
      }
}

File 2 will your query file and File1, the fasta sequence file in this case!