Question

COMMAND TO FIND EXACT DIGIT MATCHING

0

Entering edit mode

2.2 years ago

Confused_human ▴ 20

Hi everyone,

I have a dataset of fasta file which looks like this:

>13_seq2344_ATCGACGGAACTGA
>1342_seq2134_AGCTGTGGCAT
>130_SEQ2289_TCGAATCGAGGAAC

I want to remove the line which contains "13" only

so my output should look like:

>1342_seq2134_AGCTGTGGCAT
>130_SEQ2289_TCGAATCGAGGAAC

I am trying grep -w, grep -o, grep -E all these are not working for me.

Do suggest any command that works.

Thank you

Linux • 700 views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 2.2 years ago by Confused_human ▴ 20

1

Entering edit mode

grep -v '^13_'

or

awk -F '_' '($1!="13")'

ADD REPLY • link 2.2 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

you also have to think about removing sequences as well. Try this: seqkit -w 0 grep -vrip "^13_" input.fa. Awk or sed remove only the matching the line, not the following sequences. It is difficult if sequences are in multlines.

ADD REPLY • link 2.2 years ago by cpad0112 21k

score 1 · Answer 1 · 2022-03-08

1

Entering edit mode

2.2 years ago

JamesBioInf ▴ 10

Hi!

awk '!/^13_/' file.fa >  new_file.fa

Should remove the line beginning with 13 only. The substring inbetween the slashes is what is searched for so you could play with this if you need to. The ! is a logical for NOT which means print these lines if they do NOT contain the pattern.

Hope this helps :)

EDIT - See below!

ADD COMMENT • link 2.2 years ago by JamesBioInf ▴ 10

1

Entering edit mode

Should remove the line beginning with 13 only

No.

$ cat test.txt 

>13_seq2344_ATCGACGGAACTGA

>1342_seq2134_AGCTGTGGCAT

>130_SEQ2289_TCGAATCGAGGAAC

>113_test

$ awk '!/13_/' test.txt

>1342_seq2134_AGCTGTGGCAT

>130_SEQ2289_TCGAATCGAGGAAC


$ awk '!/^>13_/' test.txt    

>1342_seq2134_AGCTGTGGCAT

>130_SEQ2289_TCGAATCGAGGAAC

>113_test

ADD REPLY • link 2.2 years ago by cpad0112 21k