Finding Protein Sequence Coverage With R Or Python
        1 
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        Hello,
I am trying to find protein sequence coverage after pairwise sequence alignment by using the following formula
sequence coverage = (length of aligned residues/length of longest sequence)*100
How can I use this formula in R or in bio python? Is there any package available to calculate proein sequence coverage ?
                    
                 
                 
                
                
                    
                    
    
        
        
            protein
         
        
    
        
        
            r
         
        
    
        
        
            alignment
         
        
    
        
        
            biopython
         
        
    
    
        • 5.5k views
    
 
                
                 
                
                
 
             
            
            
         
     
 
     
    
        
            
                
 
    
    
    
    
        
        
        
        
            
                
                
                    
                        
                    
                
                    
                        This function will give the sequence coverage
   seq.cov <- function(x){
    if(!class(x)=="PairwiseAlignedFixedSubject"){
      message('Only supports objects from class PairwiseAlignedFixedSubject')
      stop()
    }
    length.string1 <- x@subject@unaligned@ranges@width
    length.string2 <- x@pattern@unaligned@ranges@width
    align.length1 <- x@pattern@range@width
    align.length2 <- x@subject@range@width
    if(length.string1 >= length.string2){
      longest.length <- length.string1
    } else {
      longest.length <- length.string2
    }
    if(align.length1 >= align.length2){
      align.length <- align.length2
    } else {
      align.length <- align.length1
    }
    x.seq.cov <- ( align.length / longest.length ) * 100
    return(x.seq.cov)
  }
 
Test function:
  library(Biostrings )
  s1 <- DNAString("AGTATAGATGATAGAT")
  s2 <- DNAString("AGTAGATAGATGGATGATAGATA")
  palign1 <- pairwiseAlignment(s1, s2)
  palign2 <-
  pairwiseAlignment(s1, s2,
                    substitutionMatrix =
                    nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
  s3 <- DNAString("AGTATAGATGATAGATGGTCC")
  s4 <- DNAString("AGTAGATAGATGGATGATAGATA")
  palign3 <- pairwiseAlignment(s3, s4)
  palign4 <-
  pairwiseAlignment(s3, s4,
                    substitutionMatrix =
                    nucleotideSubstitutionMatrix(match = 2, mismatch = 10, baseOnly = TRUE))
  seq.cov(palign1)
  seq.cov(palign2)
  seq.cov(palign3)
  seq.cov(palign4)
 
This should do it.
                    
                 
                 
                
                
                 
                
                
 
             
            
            
         
     
 
         
        
 
    
    
        
            
                 Login  before adding your answer.
         
    
    
         
        
            
        
     
    
    Traffic: 3653 users visited in the last hour
         
    
    
        
    
    
 
How did you generate those alignments? For some programs (e.g. BLAST but also others) this kind of information is provided in the output.
@philippe Thanks for your comment. I used R to do pairwise sequence alignment using Pairwisealignment function(seqinr package) and calculated sequence identity by using pid function(Biostrings package)