I want to print only those lines which contain non repetitive amino acid sequences
0
0
Entering edit mode
14 months ago

I have a file containing amino acid sequences, i.e.

AAPW
DAPA
ATPG
KLIP

I want to print only those lines which contains non-repetitive amino acids as:

ATPG
KLIP

Since, AAPW and DAPA contain alanine twice so that not printed. While ATPG and KLIP printed as these lines contains non repetitive amino acids. There are 20 different types of amino acids, given as one letter code; A C D E F G H I K L M N P Q R S T V W Y

Linux python awk sed • 600 views
ADD COMMENT
3
Entering edit mode

The title is misleading, as there is nothing repetitive in DAPA. You basically want the lines that have 4 unique letters. That should give you some hint as to how to solve the problem. But you need to show some effort. What you laid out so far sounds like you ordered a free meal and now you expect someone to serve it to you.

ADD REPLY
2
Entering edit mode

I solved this in ~4 lines of python, and 2 of those were just reading the file.

Here's a massive hint:

if len(set(line)) == len(line):
    print(line)
ADD REPLY

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6