Question: about Grep the complete sequences containing a specific motif in a fasta file
0
gravatar for taojincs
2.5 years ago by
taojincs20
taojincs20 wrote:

How to Grep the complete sequences containing a specific motif in a fasta file? Also, I want to include the lines beginning with a ">" before these target sequences.

The image is not shown so I will add this link of example because typing > in biostar is kinda misleading: https://drive.google.com/file/d/0B1pci7ps8bLganZXWFNFcWZGd1k/view?usp=sharing

An example is shown in the image:

linux sequence grep fasta • 1.3k views
ADD COMMENTlink modified 2.5 years ago by Philipp Bayer6.6k • written 2.5 years ago by taojincs20

Test file:

$ cat test.fa 
>name1
AEDIA
>name2
ALKME
>name3
AAIII
>name4
kmetq

To extract all sequences with KME in them and one can ignore the case as well in the example code:

 $ seqkit grep -s -i -r -p KME test.fa 

>name2
ALKME
>name4
kmetq

Download seqkit here. -s = match only sequence; -r = pattern is regular expression; -i = ignore case; -p = search pattern

if fasta sequences are linearized (i.e all sequences are in a single line), then code would be:

$ grep -i -B 1 --no-group-separator kme test.fa 
>name2
ALKME
>name4
kmetq
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by cpad011212k
2
gravatar for Philipp Bayer
2.5 years ago by
Philipp Bayer6.6k
Australia/Perth/UWA
Philipp Bayer6.6k wrote:

First, you'd have to change your sequences so that the DNA is all in one line, without this step you'd miss possible motifs hits that have line breaks in them.

From Pierre Lindenbaum: A: Multiline Fasta To Single Line Fasta

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < file.fa > one_line.fa

Then you can use grep -B 1 to get the hit with its preceding line, let's also use LC_ALL=C to speed things up:

LC_ALL=C grep -B 1 KME one_line.fa

that should print all sequence names and their sequence where 'KME' is present.

ADD COMMENTlink written 2.5 years ago by Philipp Bayer6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1514 users visited in the last hour