Off topic:grep only specific motif from whole protein sequence
1
0
Entering edit mode
5.6 years ago
Jason ▴ 10

Hello All,

How to Grep only specific motif from complete sequences in a fasta file using shell command? Also, I want to include the lines beginning with a > before these target sequences. I got help from the previous post in this link: A: grep whole sequences containing a specific motif in a fasta file to grep whole sequence containing motifs but now I want to grep only motifs with protein id as a header. Some protein sequence has more than one motifs.

My motifs look like that : SXXXX(F/S)XXXL

Here are list of protein sequences

>sp|Q9H257.2|CARD9_HUMAN RecName: Full=Caspase recruitment domain-containing protein 9; Short=hCARD9
MSDYENDDECWSVLEGSRVTLTSVIDRSRITPYLRQTKVLNPDDEEQVLSDPNLSIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMSEVMWFLQKLVQDLTALLSSK
>sp|Q9H37.2|CTYU_HUMAN 
HHHSVLEGFRVTLTSVIDRFRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESYSLTQLLMTEVMKLQKKVQDLTALLSSK
>sp|Q9re7.2|CARer_HUMAN RecName
BKLSVLEGWRVTLTSVIDRFRITPYLRQTKVLNPDDEEQVLSDPNLVIRKRKVGVLLDILQRTGHKGYVAFLESLELYYPQLYKKVTGKEPARVFSMIIDASGESGLTQLLMTEVMKLQKKVQDLTALLSSK

Result should be displayed like:

>sp|Q9H257.2|CARD9_HUMAN RecName: Full=Caspase recruitment domain-containing
SVLEGSRVTL
>sp|Q9H257.2|CARD9_HUMAN RecName: Full=Caspase recruitment domain-containing
SEVMWFLQKL
>sp|Q9H37.2|CTYU_HUMAN 
SVLEGFRVTL
>sp|Q9H37.2|CTYU_HUMAN 
SGESSLTQL

This command will take the whole sequence that contains motif I don't want to do like that

grep -E 'S[A-Z]{4}[FS][A-Z]{3}L' jara3.fasta > jara4.fasta
sequencing • 1.2k views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2372 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6