I am trying to use promals on the multiple sequences which are placed in a file. The file (NP_689570-negative.fasta) contains :
UniRef100_A0A0D9R2V9 Uncharacterized protein n=1 Tax=Chlorocebus sabaeus TaxID=60711 RepID=A0A0D9R2V9_CHLSB MFQDSVVFEDVAVNFTQEEWALLGPSQKKLYRDVMQETFVNLASIGENWEEKNTEDHKNQ GRKLRSHMVERLCERKEGSQFGETISQTPNPKPNKKTFTRVKPYECSVCGKDYMCHSSLN RHMRSHTEHRSYEYHKYGEKSYECKECGKRFSFRSSFRIHERTHTGEKPYKCKQCGKAFS WPSSFQIHERTHTGEKPYECKECGKAFIYHTTFRGHMRMHTGEKPYKCKECGKTFSHPSS FRNHERTHSGEKPYECKQCGKAFRYYQTFQIHERTHTGEKPYQCKQCGKALSCPTSFRSH ERIHTGEKPYKCKKCGKAFSFPSSFRKHERIHTGEKPYDCKECGKAFISLPSYRRHMIMH TGNGPYKCKECGKAFDCPSSFQIHERTHTGEKPFECKQCGKAFSCSSSFRMHERTHTGEK PHECKQCGKAFSCSSSVRIHERTHTGEKPYECKQCGKAFSCSSSFRMHERIHTGEKPYEC KQCGKAFSFSSSFRMHERTHTGEKPYECKQCGKAFSCSSSFRMHERTHTGEKPYECKQCG KAFSCSSSIRIHERTHTGEKPYECKQCGKAFSCSSSVRMHERTHTGEKPYECKQCDKAFS CSRSFRIHERTHTGEKPYACQQCGKAFKCSRSFRIHERVHSGEKPCECRQCGKIF UniRef100_A0A2K6BBS0 Zinc finger protein 823 n=1 Tax=Macaca nemestrina TaxID=9545 RepID=A0A2K6BBS0_MACNE MFQDSVAFEDVAVNFTQEEWALLGPSQKSLYRNVMQETIRNLDCIETKWEDQNIGDQCQN PRRNLRSHTCEIKNDSQCGETFGHVPDSIVNKNTPQVNPCDSSGCGEVAMGHSSLNCNIR VDTGHKSCEHQEYGEKPYTHKQRGKTINHQHSFQTHERPPTGKKPFDCKECAKTFSSLGN LRRHMAAHHGDGPYKCKLCGKAFVWPSLFHLHERTHTGEKPYECKQCSKAFPFYSSYLRH ERIHTGEKAYECKQCSKAFPDYSTYLRHERTHTGEKPYKCTQCGKAFSCYYYTRLHERTH TGEQPYACQQCGKTFYHHTSFRRHMIRHTGDGPHKCKICGKGFDCPSSVRNHETTHTGEK LYECKQCGKVLSHSSSFRSHMITHTGDGPQKCKICGKAFGCPSLFQRHERTHTGEKPYQC KQCGKAFSLSGSLRRHEATHTGVKPYKCKCGKAFSDLSSFQNHETTHTGEKPYECKECGK AFSCFKYLSQHKRIHTVEKPYECKTCRKAFSHFSNLKVHERIHSGEKPYECKECGKAFSW LTCLLRHERIHTGEKPYECLQCGKAFTFSLSGSLRRHEATHTGEKPYECQQCGKALSSLR SLHRHKRTHWKDTL UniRef100_A0A2I2YH47 Zinc finger protein 823 n=1 Tax=Gorilla gorilla gorilla TaxID=9595 RepID=A0A2I2YH47_GORGO MFQDSVAFEDVAVNFTQEEWALLGPSQKSLYRNVMQETIRNLDCVEMKWEDQNIGDQCQN AKRNLRSHTCEIKDDSQCGETFGQIPDSIVNKNTPRVNPCDSGKCGEVVLGHSSLNCNIR VDTGHKSCEHQEYGEKPYTHKQRGKAISHQHSFQTHERPPAGKKPFDCKECAKTFSSLGN LRRHMAAHHGDGPYKCKLCGKAFVWPSLFHLHERTHTGEKPYECKQCSKAFPFYSSYLRH ERIHTGEKAYECKQCSKAFPDYSTYLRHERTHTGEKPYKCTQCGKAFSCYYYTRLHERTH TGEQPYACKQCGKTFYHHTSFRRHMIRHTGDGPHKCKICGKGFDCPSSVRNHETTHTGEK PYECKQCGKVLSHSSSFRSHMITHTGDGPQKCKICGKAFGCPSLFQRHERTHTGEKPYQC KQCGKAFSLAGSLRRHEATHTGVKPYKCQCGKAFSDLSSFQNHETTHTGEKPYECKECGK AFSCFKYLSQHKRIHTVEKPYECKTCRKAFSHFSNLKVHERIHSGEKPYECKECGKAFSW LTCLLRHERIHTGEKPYECLQCGKAFTRSRFLRGHEKTHTGEKLYECKECGKALSSLRSL HRHKRTHWKDTL UniRef100_A0A2K5J152 Zinc finger protein 823 n=1 Tax=Colobus angolensis palliatus TaxID=336983 RepID=A0A2K5J152_COLAP MDSVAFEDVAVNFTQEEWALLGPSQKSLYRNVMQETIRNLDCVETKWEDQNIGDQCQNPR RNLRSHTCEIKDDSQCGETFGQIPDSTVNKNTPRGNPCDSSECGQVAMGHSSLNCNIRVD TGHKSCEHQEYGEKPYTHKQRGKTISHQHSFQTHERPPTGKKPFDCKECAKTFSSLGNLR RHMAAHHGDGPYKCKLCGKAFVWPSLFHLHERTHTGEKPYECKQCSKAFPFYSSYLRHER IHTGEKAYECKQCSKAFPDYSTYLRHERTHTGEKPYKCTQCGKAFSCYYYTRLHERTHTG EQPYACQQCGKTFYHHTSFRRHMIRHTGDGPHKCKICGKGFDCPSSVRNHETTHTGEKLY ECKQCGKVLSHSSSFRSHMITHTGDGPQKCKICGKAFGCPSLFQRHERTHTGEKPYQCKQ CGKAFSLSGSLRRHEATHTGVKPYKCKCGKAFSDLSSFQNHETTHTGEKPYECKECGKAF SCFKYLSQHKRIHTVEKPYECKTCRKAFSHFSNLKVHERIHSGEKPYECKECGKAFSWLT CLLRHERIHTGEKPYECLQCGKAFTRSRFLRGHEKTHTGEKLYECKECGKALSSLRSLHR HKRTHWKDTL UniRef100_A0A2I3S2B0 Zinc finger protein 823 n=7 Tax=Homininae TaxID=207598 RepID=A0A2I3S2B0_PANTR MFQDSVAFEDVAVNFTQEEWALLGPSQKSLYRNVMQETIRNLDCIEMKWEDQNIGDQCQN AKRNLRSHTCEIKDDSQCGETFGQIPDSIVNKNTPRVNPCDSGECGEVILGHSSLNCNIR VDTGHKSCEHQEYGEKPYTHKQRGKAISHQHSFQTHERPPAGKKPFDCKECAKTFSSLGN LRRHMAAHHGDGPYKCKLCGKAFVWPSLFHLHERTHTGEKPYECKQCSKAFPFYSSYLRH ERIHTGEKAYECKQCSKAFPDYSTYLRHERTHTGEKPYKCTQCGKAFSCYYYTRLHERTH TGEQPYACKQCGKTFYHHTSFRRHMIRHTGDGPHKCKICGKGFDCPSSVRNHETTHTGEK PYECKQCGKVLSHSSSFRSHMITHTGDGPQKCKICGKAFGCPSLFQRHERTHTGEKPYQC KQCGKAFSLAGSLRRHEATHTGVKPYKCQCGKAFSDLSSFQNHETTHTGEKPYECKECGK AFSCFKYLSQHKRTHTVEKPYECKTCRKAFSHFSNLKVHERIHSGEKPYECKECGKAFSW LTCLLRHERIHTGEKPYECLQCGKAFTRSRFLRGHEKTHTGEKLYECKECGKALSSLRSL HRHKRTHWKDTL
After using the command: /media/deepak/Kakarot_2/promals/bin$ python promals NP_689570-negative.fasta
The output in the OuTpUT file is :
`Setting up a blast directory: NP_689570-negative.fasta_blast
List of parameters: identity threshold: 0.6 blast_dir: NP_689570-negative.fasta_blast secondary structure weight: 0.2 amino acid weight: 0.8
NUMBER OF SEQUENCES: 5 NUMBER OF GROUPS: 1 representative: UniRef100_A0A0D9R2V9_Unch NP_689570-negative.fasta_blast/UniRef100_A0A0D9R2V9_Unch.aln does not exist blastpgp_cmd: /media/deepak/Kakarot_2/promals/bin/blastpgp blastpgp options: -j 3 -e 1.000000e-03 -h 1.000000e-03 -v 1000 -b 1000 -a 2 -m `
Why promals is taking the UniRef100_A0A0D9R2V9
as UniRef100_A0A0D9R2V9_Unch
while there is a space between UniRef100_A0A0D9R2V9
and uncharacterized
word in the first FASTA sequence. I am getting this error in many alignments.
Please help me.
Is it correct that there are no '
>
' chars (to denote the sequence headers?). Is this supposed to be a fasta file?