Here's a oneliner that will do the trick. It's 95% based on python, so I hope it's good. Substitute "test.txt" with your file name.
cat test.txt | python2.7 -c 'import sys; lst=[[line.rstrip("\n"), list(set(line.rstrip("\b\r\n, ").split(",")))] for line in sys.stdin]; tmp=[x for x in lst if len(x)>1]; sys.stdout.write("\n".join(tmp) + "\n")'
cat the file and we pipe it to
python2.7, with the
-c option to include a command within quotes (
''). We first import the
sys module just for having it easy at reading and writing to output (or at least, I like it haha). Every line of the python command is separated by a semi-colon (
We create a list (
lst). We read the input file through the python list comprehension syntax (see the end of the command
... for line in sys.stdin. What we declare before that is our variable that is stored in the list. In this case, another list composed of two elements. The first item of this sub-list is the raw element you want to print out, the second is a processed version of it.
The first item of the sub-list is simply stripped off of the newline metacharacter (
rstrip("\n")). The second is processed more. We remove the trailing metacharacters and commas (
rstrip(\r\b\n,). We then split this item at every comma (
split(",")). This produces an output like
[A, T, A], a list where each item is one of the ones you had separated by commas. So each line here at this point looks like this:
["A,T,A", [A, T, A]]
A list of two elements: the raw line in string format and the processed line in form of list.
Since you want only the lines which contain more than one "letter", one neat way to do so is to "unique" the list and see if the final length is > 1 (i.e. there is more than one letter). To do so in python:
set() will remove the duplicates in the list, and
list() will re-format the output as a list again. So each line here at this point looks like this:
["A,T,A", [A, T]]
Note that the latest A has disappeared, being a duplicate.
The following command in the python part is selecting only those lines that have a uniqued list > 1, meaning the ones that you are interested in. It does so with the list length (
if len(x)>1). Each selected item is a list of two elements, where the first is the raw input line. We make a list, which I here call
tmp, that contains only the raw input line for each selected item. That is what we now print out: with
sys.stdout.write("\n".join(tmp) + "\n") we
join() each element of this list with a newline character, forming the line-formatted output file, and we add a final newline to complete it (