Tool for aligning short protein sequences
        2 
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        Hi,
I have a file that looks like:
>ref_frame=1 
XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
>ref_frame=2 
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
>ref_frame=3 
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR
>read1_frame=1 
FFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
>read1_frame=2 
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
>read1_frame=3
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR
I want to do a protein alignment where I align each read frame against each ref frame. What tool can I use to do the alignment on the command line? I have considered using blastp and diamond however other answers to similar questions  here on Biostars have suggested those are better used as a search tool against a database.
                    
                 
                 
                
                
                    
                    
    
        
        
            protein
         
        
    
        
        
            alignment
         
        
    
    
        • 1.4k views
    
 
                
                 
            
            
         
     
 
     
    
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        You can use needleall from the EMBOSS package. The tool performs all-versus-all pairwise global alignments.
needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat srspair -outfile seqs.needleall
Output:
########################################
# Program: needleall
# Rundate: Wed 25 Jan 2023 12:33:06
# Commandline: needleall
#    -asequence seqs.fa
#    -bsequence seqs.fa
#    -gapopen 10.0
#    -gapextend 0.5
#    -aformat srspair
#    -outfile seqs.needleall
# Align_format: srspair
# Report_file: seqs.needleall
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: ref_frame=1
# 2: ref_frame=1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 44
# Identity:      44/44 (100.0%)
# Similarity:    43/44 (97.7%)
# Gaps:           0/44 ( 0.0%)
# Score: 220.0
# 
#
#=======================================
ref_frame=1        1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA     44
                      |||||||||||||||||||||||||||||||||||||||||||
ref_frame=1        1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA     44
#=======================================
#
# Aligned_sequences: 2
# 1: ref_frame=2
# 2: ref_frame=1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 45
# Identity:       8/45 (17.8%)
# Similarity:    16/45 (35.6%)
# Gaps:           2/45 ( 4.4%)
# Score: 24.0
# 
#
#=======================================
ref_frame=2        1 FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRS-GVETTTPPQKQ     44
                     ..||.....:|:.|.|...:....:|.....:. |.:..:|.:. 
ref_frame=1        1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA-     44
#=======================================
...
You can also change the output format.
needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat score -outfile seqs.needleall
Output:
ref_frame=1 ref_frame=1 44 (220.0)
ref_frame=2 ref_frame=1 45 (24.0)
ref_frame=3 ref_frame=1 45 (14.0)
read1_frame=1 ref_frame=1 44 (220.0)
read1_frame=2 ref_frame=1 45 (24.0)
read1_frame=3 ref_frame=1 45 (14.0)
ref_frame=1 ref_frame=2 45 (24.0)
ref_frame=2 ref_frame=2 44 (240.0)
ref_frame=3 ref_frame=2 59 (21.0)
read1_frame=1 ref_frame=2 45 (31.0)
read1_frame=2 ref_frame=2 44 (240.0)
read1_frame=3 ref_frame=2 59 (21.0)
ref_frame=1 ref_frame=3 45 (14.0)
ref_frame=2 ref_frame=3 59 (21.0)
ref_frame=3 ref_frame=3 44 (218.0)
read1_frame=1 ref_frame=3 45 (14.0)
read1_frame=2 ref_frame=3 59 (21.0)
read1_frame=3 ref_frame=3 44 (218.0)
ref_frame=1 read1_frame=1 44 (220.0)
ref_frame=2 read1_frame=1 45 (31.0)
ref_frame=3 read1_frame=1 45 (14.0)
read1_frame=1 read1_frame=1 44 (227.0)
read1_frame=2 read1_frame=1 45 (31.0)
read1_frame=3 read1_frame=1 45 (14.0)
ref_frame=1 read1_frame=2 45 (24.0)
ref_frame=2 read1_frame=2 44 (240.0)
ref_frame=3 read1_frame=2 59 (21.0)
read1_frame=1 read1_frame=2 45 (31.0)
read1_frame=2 read1_frame=2 44 (240.0)
read1_frame=3 read1_frame=2 59 (21.0)
ref_frame=1 read1_frame=3 45 (14.0)
ref_frame=2 read1_frame=3 59 (21.0)
ref_frame=3 read1_frame=3 44 (218.0)
read1_frame=1 read1_frame=3 45 (14.0)
read1_frame=2 read1_frame=3 59 (21.0)
read1_frame=3 read1_frame=3 44 (218.0)
#---------------------------------------
#---------------------------------------
 
                 
                
                
                 
            
            
         
     
 
         
        
            
                
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        cat input.fa | paste - - | grep read1_ | tr "\t" "\n" > input1.fa && cat input.fa | paste - - | grep ref_ | tr "\t" "\n" > input2.fa && blastp -query input1.fa  -subject input2.fa 
 
                 
                
                
                 
            
            
         
     
 
         
        
 
    
    
        
            
                  before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 6127 users visited in the last hour
         
    
    
        
    
    
 
Thanks Pierre, If my next step was to identify from the blastp output which alignment produces the best result, is there a way of doing that with a command?