Question: Remove hypothetical proteins
0
gravatar for biodano.geo
11 months ago by
biodano.geo0 wrote:

Hi,

i want remove all hypothetical proteins from proteome fasta file.

i'm used grep -v "hypothetical" proteome.fasta > filtered.fasta , but only remove head line not all (head and sequence of hypothetical proteins).

Please help me with awk, perl or python script.

script grep fasta • 380 views
ADD COMMENTlink modified 11 months ago by RamRS21k • written 11 months ago by biodano.geo0
2

Please use appropriate tags. software error is not an appropriate tag here.

ADD REPLYlink written 11 months ago by RamRS21k

biodano.geo - See how the tags are more relevant now. Please invest more effort into future posts.

ADD REPLYlink written 11 months ago by RamRS21k
3
gravatar for Kevin Blighe
11 months ago by
Kevin Blighe43k
Republic of Ireland
Kevin Blighe43k wrote:

This should work (tested on BASH on linux):

cat test.fasta 
> GENE
TTTT
CCCC
ATGC
> hypothetical A
ATCG
ATCG
ATTT
AAAA
> LOC5,HypothetICAL
ATCG
ATCG
ATTT
AAAA
> TP53
ATCG
ATCG
ATTT
AAAA
> hypothetical, LOC12354
ATCG
ATCG
ATTT
AAAA
> BRCA1
ATCG
ATCG
ATTT
AAAA



awk '/^>/ && toupper($0) ~ /HYPOTHETICAL/ {bool=1}; /^>/ && toupper($0) !~ /HYPOTHETICAL/ {bool=0}; {if (bool==0) print}' test.fasta 
> GENE
TTTT
CCCC
ATGC
> TP53
ATCG
ATCG
ATTT
AAAA
> BRCA1
ATCG
ATCG
ATTT
AAAA

This makes the use of a boolean flag that is set to '1' when a header with 'hypothetical' is found, and set to '0' when not found.

This is case insensitive through the use of the toupper function. So, hypothetical can be written any way.

Kevin

ADD COMMENTlink modified 11 months ago • written 11 months ago by Kevin Blighe43k

Thank so much. Please you can teach me as understand this awk script.

ADD REPLYlink written 11 months ago by biodano.geo0
1
/^>/ && toupper($0) ~ /HYPOTHETICAL/ {bool=1};

If the line starts with > and the to upper case transformed text of the line contains the word HYPOTHETICAL set the value of bool to 1.

/^>/ && toupper($0) !~ /HYPOTHETICAL/ {bool=0};

If the line starts with > and the to upper case transformed text of the line doesn't contain the word HYPOTHETICAL set the value of bool to 0.

{if (bool==0) print}

If bool is set to 0 print the line. This is the case for the header line that doesn't have HYPOTHETICAL in and all lines that follows until the next line start starts with > and the check about the word HYPOTHETICAL is done again.

fin swimmer

ADD REPLYlink written 11 months ago by finswimmer11k

Thank so much i understand.

ADD REPLYlink written 11 months ago by biodano.geo0

Hello biodano.geo,

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLYlink modified 11 months ago • written 11 months ago by finswimmer11k

Thanks finswimmer! :)

ADD REPLYlink written 11 months ago by Kevin Blighe43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2164 users visited in the last hour