Question: Lookup Table error in Sequence Alignment with R
1
gravatar for pattijane
2.8 years ago by
pattijane10
pattijane10 wrote:

Hello,

I'm using the following code to read from a text file containing more than 200 fasta files and to align these proteins.

seqs = read.fasta(file = "sequencefasta.txt", seqtype = "AA")
self_aligned_score = rep(NA, length(seqs))

for (i in 1:length(seqs)){
  cat('aligning ',i,'/',length(seqs),' with itself..\n')
  seq = paste(toupper(getSequence(seqs[i][[1]])), sep="", collapse="")
  align_score = pairwiseAlignment(seq, seq, scoreOnly=TRUE, gapExtension=0.5, type="local", substitutionMatrix = "BLOSUM50")
  self_aligned_score[i] = align_score
}

But I'm having the following error, which I have no idea about why.

Error in .Call2("XStringSet_align_pairwiseAlignment", pattern, subject,  : key 32 not in lookup table

Here is a snippet of my sequencefasta.txt file:

>sp|O00141|SGK1_HUMAN Serine/threonine-protein kinase Sgk1 OS=Homo sapiens GN=SGK1 PE=1 SV=2
MTVKTEAAKGTLTYSRMRGMVAILIAFMKQRRMGLNDFIQKIANNSYACKHPEVQSILKI
SQPQEPELMNANPSPPPSPSQQINLGPSSNPHAKPSDFHFLKVIGKGSFGKVLLARHKAE
EVFYAVKVLQKKAILKKKEEKHIMSERNVLLKNVKHPFLVGLHFSFQTADKLYFVLDYIN
GGELFYHLQRERCFLEPRARFYAAEIASALGYLHSLNIVYRDLKPENILLDSQGHIVLTD
FGLCKENIEHNSTTSTFCGTPEYLAPEVLHKQPYDRTVDWWCLGAVLYEMLYGLPPFYSR
NTAEMYDNILNKPLQLKPNITNSARHLLEGLLQKDRTKRLGAKDDFMEIKSHVFFSLINW
DDLINKKITPPFNPNVSGPNDLRHFDPEFTEEPVPNSIGKSPDSVLVTASVKEAAEAFLG
FSYAPPTDSFL

>sp|O00311|CDC7_HUMAN Cell division cycle 7-related protein kinase OS=Homo sapiens GN=CDC7 PE=1 SV=1
MEASLGIQMDEPMAFSPQRDRFQAEGSLKKNEQNFKLAGVKKDIEKLYEAVPQLSNVFKI
EDKIGEGTFSSVYLATAQLQVGPEEKIALKHLIPTSHPIRIAAELQCLTVAGGQDNVMGV
KYCFRKNDHVVIAMPYLEHESFLDILNSLSFQEVREYMLNLFKALKRIHQFGIVHRDVKP
SNFLYNRRLKKYALVDFGLAQGTHDTKIELLKFVQSEAQQERCSQNKSHIITGNKIPLSG
PVPKELDQQSTTKASVKRPYTNAQIQIKQGKDGKEGSVGLSVQRSVFGERNFNIHSSISH
ESPAVKLMKQSKTVDVLSRKLATKKKAISTKVMNSAVMRKTASSCPASLTCDCYATDKVC
SICLSRRQQVAPRAGTPGFRAPEVLTKCPNQTTAIDMWSAGVIFLSLLSGRYPFYKASDD
LTALAQIMTIRGSRETIQAAKTFGKSILCSKEVPAQDLRKLCERLRGMDSSTPKLTSDIQ
GHASHQPAISEKTDHKASCLVQTPPGQYSGNSFKKGDSNSCEHCFDEYNTNLEGWNEVPD
EAYDLLDKLLDLNPASRITAEEALLHPFFKDMSL
alignment protein sequence R fasta • 1.5k views
ADD COMMENTlink written 2.8 years ago by pattijane10

Which libraries are you using to import the fasta and to align the sequences?

ADD REPLYlink written 2.8 years ago by Selenocysteine600

I use these:

require("Biostrings")
require("seqinr")
require("Rcpp")
ADD REPLYlink written 2.8 years ago by pattijane10

I think that the issue might be that you have some non-standard characters in your sequences. Can you check that you just have aminoacids, no "X"s and no other letters which do not correspond to an aa?

ADD REPLYlink written 2.8 years ago by Selenocysteine600

I checked for "X"s and there was none. Besides, wiki article for aa describes X as a valid letter to describe unknowns, https://www.wikiwand.com/en/Amino_acid#/Table_of_standard_amino_acid_abbreviations_and_properties

ADD REPLYlink written 2.7 years ago by pattijane10

Yes it is standard, but maybe for some reason this library doesn't recognise it. Did you check for all characters which are not standard amino acids or just X?

ADD REPLYlink written 2.7 years ago by Selenocysteine600

I checked for X, U, O, B, Z, and J. There were none of them.

ADD REPLYlink written 2.7 years ago by pattijane10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1289 users visited in the last hour