Is there a code to find consensus motif
1
0
Entering edit mode
5.7 years ago
vinayjrao ▴ 250

Hello,

I've been trying to write a code to find a consensus motif in a given sequence, and for this purpose, I was only able to reach till finding a substring in a string. I want to be able to allot multiple nucleotides/amino acids at each position, and also enter N/X representing any of the nucleotides/amino acids. I would very much appreciate any help.

Thanks.

P.S. The post tags represent the languages I'm comfortable understanding.

Edit: Example of the consensus motif - A/T A A G C A A/T/G N N A

Sequence - CGATCGTG TAAGCAGCTA GTCATG

Bolded sequence is the consensus

C awk shell python • 2.4k views
ADD COMMENT
1
Entering edit mode
5.7 years ago

In shell using grep and regular expressions:

echo 'CGATCGTG TAAGCAGCTA GTCATG' | grep  -o "[AT]AAGCA[ATG]..A"
TAAGCAGCTA

'N' is expressed as '.', meaning that it can take any value. Multiple nucleotides at one position are put into square brackets.

ADD COMMENT
0
Entering edit mode

Thanks a lot. It's perfect.

ADD REPLY
1
Entering edit mode

In the same lines of Carlo Yague

echo 'CGATCGTG TAAGCAGCTA GTCATG' | grep -Po \([AT]\)A{2}GCA[\1G].{2}A
TAAGCAGCTA
ADD REPLY
0
Entering edit mode

Thanks. This works too. I could use the .{2} when I have larger repeats of any nucleotide/amino acid. Although, I would like to know why it [\1G] and not [ATG]?

ADD REPLY
0
Entering edit mode

The first AT is made a group and every time and anywhere you can call it by its serial number (1 here)

ADD REPLY
0
Entering edit mode

That's an extremely handy option. Thanks again :)

ADD REPLY

Login before adding your answer.

Traffic: 2897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6