Question: Extracting Sequences After "Motif" & Between Motifs In Multifasta File
gravatar for Raghul
7.3 years ago by
Raghul200 wrote:

Hi I want to extract sequences after a motif say "TTTTTAAAAA" from a multifasta file. I do not want the nucleotides before this keyword. Is it possible to extract nucleotides between 2 motifs with grep? eg. nucleotides between TTTTTAAAA & AAAATTTT. I tried with grep but I need the fasta headers also. Can anybody suggest a solution in grep (if possible) or perl or python.

thanx raghul

parsing • 2.8k views
ADD COMMENTlink modified 7.2 years ago by PoGibas4.8k • written 7.3 years ago by Raghul200

You can get a case with the motif found several times within a same sequence. How do you want to deal with that?

ADD REPLYlink written 7.3 years ago by Manu Prestat4.0k

Hello!, I would like to do something similar...did you find a way to complete your task?

ADD REPLYlink written 3.6 years ago by etarisal0
gravatar for Ying W
7.3 years ago by
Ying W4.0k
South San Francisco, CA
Ying W4.0k wrote:

I don't think it would be possible with grep but this can be done w/a regex in perl. Something along the lines of:

$line = "";
foreach(<FILE>) { #for every line of the file
  if($_[0] == ">") { #if line starts with >, it is a header so process the previous sequence
    if($line =~ /[TTTTTAAAAA([ACTGN]+)AAAATTTT/g) { #regex to match motif
      print "$1\n" #print sequence in between motif
   $line = ""
    print "$_"; #print header
  else {
    $line = $line.$_ #append sequence
  print "$1\n"

or something like that, (warning above code is untested and should be treated as pseudocode)

ADD COMMENTlink written 7.3 years ago by Ying W4.0k

Some (many?) versions of grep, such as the "standard" version included in Linux distributions, take the option "-P" meaning "interpret regex as a Perl regex". So if Perl can do it, so can grep.

ADD REPLYlink written 7.3 years ago by Neilfws48k
gravatar for PoGibas
7.2 years ago by
PoGibas4.8k wrote:

grep way

  grep -o TTTTTAAAA[A-Z]*AAAATTTT sequence 
ADD COMMENTlink written 7.2 years ago by PoGibas4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1379 users visited in the last hour