COMMAND TO FIND EXACT DIGIT MATCHING
1
0
Entering edit mode
2.2 years ago

Hi everyone,

I have a dataset of fasta file which looks like this:

>13_seq2344_ATCGACGGAACTGA
>1342_seq2134_AGCTGTGGCAT
>130_SEQ2289_TCGAATCGAGGAAC

I want to remove the line which contains "13" only

so my output should look like:

>1342_seq2134_AGCTGTGGCAT
>130_SEQ2289_TCGAATCGAGGAAC

I am trying grep -w, grep -o, grep -E all these are not working for me.

Do suggest any command that works.

Thank you

Linux • 700 views
ADD COMMENT
1
Entering edit mode
grep -v '^13_'

or

awk -F '_' '($1!="13")'
ADD REPLY
0
Entering edit mode

you also have to think about removing sequences as well. Try this: seqkit -w 0 grep -vrip "^13_" input.fa. Awk or sed remove only the matching the line, not the following sequences. It is difficult if sequences are in multlines.

ADD REPLY
1
Entering edit mode
2.2 years ago
JamesBioInf ▴ 10

Hi!

awk '!/^13_/' file.fa >  new_file.fa

Should remove the line beginning with 13 only. The substring inbetween the slashes is what is searched for so you could play with this if you need to. The ! is a logical for NOT which means print these lines if they do NOT contain the pattern.

Hope this helps :)

EDIT - See below!

ADD COMMENT
1
Entering edit mode
Should remove the line beginning with 13 only

No.

$ cat test.txt 

>13_seq2344_ATCGACGGAACTGA

>1342_seq2134_AGCTGTGGCAT

>130_SEQ2289_TCGAATCGAGGAAC

>113_test

$ awk '!/13_/' test.txt

>1342_seq2134_AGCTGTGGCAT

>130_SEQ2289_TCGAATCGAGGAAC


$ awk '!/^>13_/' test.txt    

>1342_seq2134_AGCTGTGGCAT

>130_SEQ2289_TCGAATCGAGGAAC

>113_test
ADD REPLY

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6