Why doesn't my grep expression work?
0
0
Entering edit mode
3.4 years ago
C_sinensis ▴ 30

Hello,

I am trying to execute a very simple grep command, but I can't get it to work and I cannot figure out why. Here is exactly what I am doing:

echo 'ACTTTATTA' > myseq

cat myseq

ACTTTATTA

grep -E '^.{2}TTT' myseq

No results

I am trying to grep a sequence that has 3 Ts starting from the third nucleotide, just like the one I am creating above. But I get no output. The error seems to come from {2}, because I have been able to reproduce this behaviour only when I introduce {number} and it gets solved when I replace it by something else.

I have:

  • Tested it in a different computer

  • Tested it in google colab

  • Tested it in regex101 (it works as expected)

What am I doing wrong?

grep regex • 2.0k views
ADD COMMENT
0
Entering edit mode

What's the exclamation mark before the grep?

This worked for me:

echo 'ACTTTATTA' | grep -E '^.{2}TTT'

ACTTTATTA

This should work too

cat myseq | grep -E '^.{2}TTT'
ADD REPLY
0
Entering edit mode

Thank you! The "!" sign is used in google colab to execute a terminal command. I included it by mistake and have corrected it.

It looks like your command works in my computer but does not work in google colab. Can this be due to some different version of grep? Do you think you could check that your code indeed does not work in your hands in google colab jut to make sure I am not messing up?

ADD REPLY
0
Entering edit mode

I don't use googlecolab, but maybe you can use awk or try using " instead of ' and see if it works there:

 echo "ACTTTATTA" | grep -oP "^.{2}TTT.*"

echo "ACTTTATTA" | grep -E "^.{2}TTT"
ADD REPLY
0
Entering edit mode

The following worked for me:

% grep -E '^.{2}TTT' <(echo -e 'ACTTTATTA')
ACTTTATTA
% grep -E '^.{2}TTT' <(echo -e 'AACTTTATTA')
% grep -E '^.{2}TTT' <(echo -e 'CTTTATTA') 
% grep -E '^.{2}TTT' <(echo -e 'XCTTTATTA')
XCTTTATTA
% grep -E '^.{2}TTT' <(echo -e 'XCTTTATTA')

You should also be able to use egrep, as well as the more constrained pattern ^[ACTG]{2}TTT, if you need to:

% egrep '^[ACTG]{2}TTT' <(echo -e 'XCTTTATTA')
% egrep '^[ACTG]{2}TTT' <(echo -e 'ACTTTATTA')
ACTTTATTA
ADD REPLY
0
Entering edit mode

Thank you! As I replied to Fatima above, those seem to work in my computer, but not in google colab, which I need to use for this project. Any chance you have some idea why this might be?

ADD REPLY
0
Entering edit mode

I don't use Google Colab. I'm afraid I'm not much help there. If you have a reproducible issue, you could maybe file it here? https://github.com/googlecolab/colabtools/issues

ADD REPLY
0
Entering edit mode

I wonder if python is doing something to the {2}. Could you try:

!echo grep -E '^.{2}TTT' myseq > how_my_command_looks_like

and then check the file?

ADD REPLY
0
Entering edit mode

I think you hit the nail on the head. Something is off:

!cat how_my_command_looks_like

grep -E ^.2TTT myseq

ADD REPLY
0
Entering edit mode

I was able to solve it by first defining a python string and then executing such string:

command = "grep -E '^.{2}TTT' <(echo -e 'XCTTTATTA')"

!$command

XCTTTATTA

Thank you! That was a very good catch. I wonder if there is another solution though...

ADD REPLY
0
Entering edit mode

It could maybe be solved by different quoting or escaping characters, but I never use such things for shell commands so I wouldn't know.

ADD REPLY
0
Entering edit mode

Have you tried this one?

grep -E "^.{2}TTT" <(echo -e "XCTTTATTA")
ADD REPLY
0
Entering edit mode

@OP, is this solved, can one or many of the comments be moved to answer?

ADD REPLY

Login before adding your answer.

Traffic: 1489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6