Alphabetic Sort In Unix/Perl With Preference On Order Of Alphabets To Be Followed
3
2
Entering edit mode
10.8 years ago
Monzoor ▴ 300

sorting DNA sequences in unix is done in alphabetic order Is is possible to sort DNA sequences with a specified order of alphabets ?

unix perl sort dna sequence • 3.5k views
5
Entering edit mode
10.8 years ago

I'm not sure I understand your question. But if you want to sort for example using : C, A, T and G , I would use 'tr' to change the letters of the sequence. Something like

cat onesequenceperline.txt |\
tr "C" "0" | tr "A" "1" | tr "T" "2" | tr "G" "3" |\
sort |\
tr "0" "C" | tr "1" "A" | tr "2" "T" | tr "3" "G" > result.txt

3
Entering edit mode

You can cut the number of processes down from 10 to 3 by removing the 'useless use of cat' and the redundant tr's:

tr 'CATG' '0123' &lt; onesequenceperline.txt | sort | tr '0123' 'CATG' > result.txt

2
Entering edit mode

Wow!. This is a simple yet effective idea. Somehow never struck me. I have to check how it scales for huge data sets. Any way, thanks a lot PL.

0
Entering edit mode

@Monzoot : very nice suggestion, thanks :-)

0
Entering edit mode

@Keith , very nice suggestion ! thanks ! :-)

4
Entering edit mode
10.8 years ago
Rvosa ▴ 580

If you are asking how to sort alphabetically, a file like Pierre is imagining could be sorted alphabetically in perl like this:

perl -ane 'chomp;print sort {$a cmp$b} split(//,$_), "\n"' onesequenceperline.txt  Or in reverse order by switching $a and $b around (actually, in normal order the first argument to sort can be omitted so you could golf it down some more). An advantage of this is that it handles all IUPAC single nucleotide codes, but a disadvantage is that it doesn't let you define a custom ordering, as in Pierre's solution. If you want that, you will have to define a custom sort function, which won't fit neatly in a one-liner. Or at the very least a custom mapping, such as the %map hash, which achieves the same ordering as Pierre's, but sets all letters in the sequence to uppercase and checks to see if there are no unexpected letters (it dies if there are): use strict; my %map = ( 'C' => 0, 'A' => 1, 'T' => 2, 'G' => 3, ); while(<>) { chomp; print sort {$map{$a} <=>$map{$b} } grep { exists$map{$_} or die$_ } map { uc } split //;
print "\n";
}

0
Entering edit mode

This is also a good suggestion. Thank you is all I can say

0
Entering edit mode
10.8 years ago
Spitshine ▴ 640

I am not sure I understand your question either but it sounds as if you could use a custom compare function in Perl to pass to your sort. (http://perldoc.perl.org/functions/sort.html)