Question: Error in Calculation of Ka/Ks value using Ka/Ks calculator
3
gravatar for deepkumar1983
4.7 years ago by
United States
deepkumar198340 wrote:

Hi, I am using Ka/Ks calculator software for identification of rapidly evolving genes. I align my gene sequences with muscle software and then after converting to axt format used this tool. I am surprised that it work well with some alignment and for others this give me an error "Error. The size of two sequences in 'ID is not equal." I am not able to understand this error. So if anybody has the idea please provide the help.

Thanks

Deepak

Input file having Error

>ENSBTAT00000025915.4 ensembl:known_by_projection chromosome:UMD3.1:10:89053688:89074011:-1 gene:ENSBTAG00000019454.4 gene_biotype:protein_coding transcript_biotype:protein_coding
ATGA-----TTGC-------------------------------------------------------------------------------------------------------------------------GTCGTGT----------------------------------------------------------------------------------------------------------------------CTGTGTTA---CCTGCTGCTGCCGGCCGC-----------------GCGCCTTTTCC-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GCGCCCTCT----------------------------------------------------------------------------------------------------------------------------------CAGATGCTTTCTTCACATGTCGGAAAAATGTCCTCCTGGCGAAGAGCTCGTCCTCCCAGGTAGAAGGCAACTTTGCCATGGCCCCTCGGGGCCCCGACCAGGAGGAGTGTGAGGGCCTGCTGCAGCAGTGGAGGGAAGAAGGGTCGAGCCAGGTGCTGTCAACTGTGAGCGACGGTCCCCTTGTAGATAAGGGACTCGCCGAGAGCAGCCTGGCCCTCCTGATGGATAATCCCGGAGAACAGGATGCTGCTCCGGAGGACACGTGGTCCAGCAGGCAGCTGAGTGACCTGCGGGCAGCGGAGAACCTGGAGGAGCCTTTTCCCGAGGTGCTAGGAGAGGAGGAGCCGCTGCCGGAGGTCGAGGGCCCAATGTGGGCAGCAGTGCCTGTGCAGACCGGCCGCCAGTACACAGATTGTGCTGTCCTCCCTATGGGTGCGCTGGCCACAGAGCAGTGGGACGAGGACCCCGCGGTGGTGGCCTGGAGCATCGCACCGGAGCCTGTGCCCCAGGAAGAGGCTCCCGTCTGGCCCTTTGAGGGTCTGGGGCAGCTGCAGCCTCCCCCAGTGGAAATCCCGTATCACGAAATCTTGTGGCGAGAATGGGAGGATTTCTCCACTCAGCCAGATGTTCAGGGCCTGGAGGCAGGGGATGGCCCTCAGTTCCAGTTCACTCTGATGTCCTATAATATCCTGGCCCAGGACCTAATGCAGCAGAGCTCCGAGCTCTATCTGCATTGCCACCCAGACATCCTGAACTGGAGCTATCGCTTTGCGAATCTCATGCAGGAATTCCAGCACTGGGACCCGGACATCTTGTGTCTCCAGGAAGTCCAGGAAGATCATTACTGGGAGCAGCTGGAGCCCTCTCTGAGAATGATGGGCTTTACCTGTTTCTACAAGAGGAGGACCGGGTGTAAGACAGATGGCTGTGCTGTCTGCTACAAGCCCACGAGATTCCGTCTGCTCTGCGCCAGCCCCGTGGAGTACTTTCGGCCTGGCTTGGAGCTCCTCAATCGGGACAACGTGGGCTTAGTGTTGCTGCTGCAGCCACTGGTCCCAGAAGGCCTGGGGCAAGTCTCGGTGGCCCCCTTATGTGTGGCAAATACCCACGTCCTGTACAACCCACGGCGGGGCGACGTCAAGCTGGCCCAGATGGCCATTCTCCTGGCTGAAGTGGACAAGGTGGCCAGGCTGTCAGATGGCAGCCACTGCCCCATTGTCCTGTGTGGGGACCTGAACTCTGTCCCCGACTCGCCTCTCTACAACTTCATCAGGGACGGGGAGCTCCAGTACCACGGGATGCCAGCCTGGAAGGTATCTGGACAGGAAGACTTCTCCCATCAGCTTTATCAGAGGAAGCTGCAGGCCCCACTGTGGCCCAGCTCCCTGGGTATCACTGACTACTGTCAGTATGTCACCTCCTGTCACCCCACGAGCTCAGAGAGACGCAAGTATAGCCGAGACTTCCTGCTGCGTTTCCGCTTCTGCAGCATGGCCTGCCGGCGACCTGTGGGACTGGTTCTTCTGGAAGGAGTGACAGACACTAAGCCAGAGCGACCTGCTGGCTGGGCTGAGTCTGTCATTGAGGAAGATACATCTGAGTCTGAGCCGGATGTCCCCAGGACTGCAGGCACCATCCAGCACTGCCTACACCTGACCTCGGTGTATACTCATTTCCTGCCCCAGCACGGCCGCCCAGAGGTCACCACAATGCCCCTGGGTCTGGGAACGACAGTGGATTACATCTTCTTCTCAGCTGAGTCCTGCGAGAATGGGAACAGAACTG-------------------ATCGCAGGCTGTATCAGG--ATGGAACCCTCAAGCTCCTGGGCCGGCTCTCGCTCCTCTCTGAAGAGATCCTCTGGGCTGCCAACGGCTTACCCAACCC--CTTCTGCTCTTCAGA----------------CCAC--CTCTGCCTACTGGCTAGCTT--CGGGATGGAAATCGCGGCCCC---------------ATGA
ATGAGACGTTTGCTGCAGCGCTCAGGTCCTTTCACTGCAGCGCACAGACTCCCTAGTTGTGGCCTGAGTGGTCCAAAGGGCGCGGGTTCAGTAACTGCAGCACGTGGACTTCGTAGCTCCACGGCAGTCAATCAGTCGTGTCCAACTCTTTGCGACCCCATGGACTGCAGCACACCAGGCTTCCCTGTCCATCACCAACTTCCAGAGCCTGCTCAAACTCAAGTCCATCGAGTCAGTGATGCCATCCCACCATCTCATCCTCTGTCATCCCCTTCTCCTCCTGGCTTCAATCTTTCCCAGCATCAGTGTCTTTTCCAAGGCATTTTCAAGAAGAAGAGAGAGGCCAAGGATAAAATCCTAGAAACACCCTCATTGAAAGGTGGGTGTTCAGAGGAACAGAAAGAAGAGGTGCTAGGACAAAACGCCCACAGAGTTGGAAGGAAAGTAAGAATGTTACAGAAACCCGAGGAAAGAGACTCTTGTAGTTATGTGCTAGTTCCTAAGAGGCTTCCCAATCTTCAGGTTACCCCAGTGGCCTCTGCTTCATCCCTGGAGAAGCCTTGCAGTGTTATGATGTACCCAGCTGGTCCTGGATCTGGTGACTCTCCAGAGGGCTTCCCGGAACAACCAAGGCGCCAGTCCCAAACCGGAAATCGCACGGCTCAGACACTGGATGCTTTCTTCACATGTCGGAAAAATGTCCTCCTGGCGAAGAGCTCGTCCTCCCAGGTAGAAGGCAACTTTGCCATGGCCCCTCGGGGCCCCGACCAGGAGGAGTGTGAGGGCCTGCTGCAGCAGTGGAGGGAAGAAGGGTCGAGCCAGGTGCTGTCAACTGTGAGCGACGGTCCCCTTGTAGATAAGGGACTCGCCGAGAGCAGCCTGGCCCTCCTGATGGATAATCCCGGAGAACAGGATGCTGCTCCGGAGGACACGTGGTCCAGCAGGCAGCTGAGTGACCTGCGGGCAGCGGAGAACCTGGAGGAGCCTTTTCCCGAGGTGCTAGGAGAGGAGGAGCCGCTGCCGGAGGTTGAGGGCCCAATGTGGGCAGCAGTGCCTGTGCAGACCGGCCGCCAGTACACAGATTGTGCCGTCCTCCCTGTGGGTGCGCTGGCCACAGAGCAGTGGGACGAGGACCCCGCGGTGGTGGCCTGGAGCATCGCACCGGAGCCTGTGCCCCAGGAAGAGGCTCCCGTCTGGCCCTTTGAGGGTCTGGGGCAGCTGCAGCCTCCCCCAGTGGAAATCCCGTATCATGAAATCTTGTGGCGAGAATGGGAGGATTTCTCCACTCAGCCAGATGTTCAGGGCCTGGAGGCAGGGGATGGCCCTCAGTTCCAGTTCACTCTGATGTCCTATAATATCCTGGCCCAGGACCTAATGCAGCAGAGCTCCGAGCTCTATCTGCATTGCCACCCAGACATCCTGAACTGGAGCTATCGCTTTGCGAATCTCATGCAGGAATTCCAGCACTGGGACCCGGACATCTTGTGTCTCCAGGAAGTCCAGGAAGATCATTACTGGGAGCAGCTGGAGCCCTCTCTGAGAATGATGGGCTTTACCTGTTTCTACAAGAGGAGGACCGGGTGTAAGACAGATGGCTGTGCTGTCTGCTACAAGCCCACGAGATTCCGTCTGCTCTGCGCCAGCCCCGTGGAGTACTTTCGGCCTGGCTTGGAGCTCCTCAATCGGGACAACGTGGGCTTAGTGTTGCTGCTGCAGCCACTGGTCCCAGAAGGCCTGGGGCAAGTCTCGGTGGCCCCCTTATGTGTGGCAAATACCCACGTCCTGTACAACCCACGGCGGGGCGACGTCAAGCTGGCCCAGATGGCCATTCTCCTGGCTGAAGTGGACAAGGTGGCCAGGCTGTCAGATGGCAGCCACTGCCCCATCGTCCTGTGTGGGGACCTGAACTCTGTCCCCGACTCGCCTCTCTACAACTTCATCAGGGACGGGGAGCTCCAGTATCACGGGATGCCAGCCTGGAAGGTATCTGGACAGGAAGACTTCTCCCATCAGCTTTATCAGAGGAAGCTGCAGGCCCCACTGTGGCCCAGCTCCCTGGGTATCACTGACTACTGTCAGTATGTCACCTCCTGTCACCCCACGAGCTCAGAGAGACGCAAGTATAGCCGAGACTTCCTGCTGCGTTTCCGCTTCTGCAGCATGGCCTGCCGGCGACCTGTGGGACTGGTTCTTCTGGAAGGAGTGACAGACACTAAGCCAGAGCGACCTGCTGGCTGGGCTGAGTCTGTCATTGAGGAAGATACATCTGAGTCTGAGCCGGATGTCCCCAGGACTGCAGGCACCATCCAGCACTGCCTACACCTGACCTCGGTGTATACTCATTTCCTGCCCCAGCACGGCCGCCCAGAGGTCACCACAATGCCCCTGGGTCTGGGAACGACAGTGGATTACATCTTCTTCTCAGCTGAGTCCTGCGAGAATGGGAACAGAACTGGCACGTGCTGCGAGCTTGAAGCAAAGGGAGCAGCAGGAAATGTGGCCATCCACCCAGTAGGCTGGCTTCTGGTCCTCTC----GGGAGCCTGTG--------ACGGC--ACTCAGCTCTGCGTCAGTTCCAAGAAAGTGGCCGTGCGGTATCCACGGTTCTGC--ACTGGCGACCGTGGTGAGAAGGGACTTGAGATCCCCGTTGCTATGGTTTTCTGA
software error • 3.3k views
ADD COMMENTlink modified 19 months ago by Philipp Bayer6.8k • written 4.7 years ago by deepkumar198340
1

Hi deepakumar, I have the same problem in ka/ks calculator. Do you have any idea how to resolve it ????

ADD REPLYlink written 3.3 years ago by adeena_hassan50
1

Can someone post an example of a pair for which it does work?

I think it might be due to the alignment result, if the two sequences are of too much length difference you will have to trim the unaligned parts.

Moreover, muscle is likely not the best tool to align the sequences for this purpose, as you will need to do a codon-aware alignment (== align the dna sequence codon per codon), otherwise the Kn/Ks estimates will not make any sense!

ADD REPLYlink written 2.4 years ago by lieven.sterck8.9k
1

In addition, can someone really explain how are you running the tool? There are three people with the problem, just one test sequence, but no mention to as how the tool is being used. Is it the command-line version? The windows version? The online version? What parameters (if any) are being used?

ADD REPLYlink written 2.4 years ago by h.mon31k

@deepkumar and @adeena_hassan I've been observing the same problem. Did you solve it somehow? Thanks

ADD REPLYlink written 2.4 years ago by smlatorreo0

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 2.4 years ago by WouterDeCoster44k

Ka/Ks calculator is a command line tool and it takes input in AXT format. A perl script that convert FASTA file into AXT is available with the tool. It reads a pair of sequences and computes corresponding estimates (length of the two sequence must b equal).

Here is my input file in AXT format:

cat Selection_test.axt

Human_gene-Dog_gene
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGAGGTTTGGCTACGGAGGCTGGGTGCAGCTCCAACACCACAGCAAGGGAGAAGCAAAGTTGTTGAAAGATGGGAAAGTCTTCAGAGTGGTGCGAGAGAATTGCTGGGATCCAGAAGTTCGTATTAGAGAAATGGACCAAAAAGGAGTAACAGTGCAAGCCCTTTCCACAGTTCCTGTCATGTTTAGCTACTGGGCCAAACCTGAGGACACTTTAAACCTGTGCCAGCTTTTAAACAACGACCTTGCCAGCACCGTTGTGAGCTACCCCAGGAGGTTCGTGGGTCTGGGGACGTTGCCCATGCAGGCCCCTGAGCTGGCGGTCAAGGAGATGGAGCGCTGTGTGAAAGAGCTGGGCTTTCCCGGGGTCCAAATTGGCACCCACGTCAACGAGTGGGACCTGAACGCGCAGGAGCTCTTTCCTGTCTATGCGGCAGCCGAAAGGCTGAAGTGTTCCCTGTTCGTGCATCCCTGGGACATGCAGATGGATGGACGAATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCATAGCCATTTGCTCCATGATCATGGGTGGAGTATTTGAGAAGTTTCCCAAACTGAAAGTGTGTTTCGCACATGGTGGTGGTGCCTTCCCCTTCACAGTGGGAAGAATCTCCCATGGATTCAGCATGCGCCCAGATCTGTGTGCCCAGGACAACCCCATGAACCCGAAGAAATACCTTGGTTCCTTTTACACAGATGCTTTGGTTCATGATCCTCTGTCCCTCAAGCTGTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACCGATTACCCCTTTCCACTAGGTGAGCTGGAACCTGGGAAACTAATAGAGTCCATGGAAGAATTTGATGAAGAAACAAAGAATAAACTCAAAGCCGGCAATGCCCTGGCATTTTTGGGTCTTGAGAGAAAACAATTTGAATGA
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGCGATTCAGCTATGGAGGCTGGGTGCAGCTTCAACACCACAGCAAGGGAGAAGCAAAAATGTTGAAGGATGGGAAGGTCTTCAGAGTGGTCCAAGAGAACTGCTGGGATCCAGAAGTCCGTATTAGAGAAATGGACCAAACAGGAGTGTCCGTGCAAACCCTTTCCACAGTCCCCCTCATGATTAGCTATTGGGCCAAACCTCAGGACACTTTAGACCTGTGCCAGCTTTTAAACAACGACTTAGCTGCCACTGTTGCGAACCATCCCAGGAGGTTTGTGGGCCTGGGGACATTGCCCATGCAGGCTCCTGAGCTTGCCGTCAAGGAGATGGAGCGCTGTGTGAAGGAGCTGGGCTTTCCCGGGGTCCAGATTGGTTCCCATATCAACGAGTGGGACCTGAATGCACGGGAACTCTTCCCCTTCTACGCATTAGCAGAAAAACTGAACTGTTCGTTATTTGTGCACCCCTGGGACATGCAAATGGATGGACGGATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCACAGCCATTTGTTCCATGATCATGGGAGGAGTGTTTGAGAAATTTCCTAAATTGAAAGTGTGTTTTGCACATGGAGGTGGTGCCTTCCCTTTCACAGTTGGAAGAATCTCCCATGGATTCAACATGCGTCCAGATCTGTGTGCCCAGGACAATCCAATCAACCCAAAGAAATACCTTGGTTCCTTTTACACAGACTCCTTGGTTCATGATCCTCTGGCACTCAAGCTCTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACAGATTACCCCTTTCCACTAGGAGAGCTGAAACCTGGGAAATTGATAGAGTCCATAGAAGAATTTGATGCAGAAACAAAGGATAAACTCAAAGCTGGCAATGCCCTCACATTTTTGGGCCTTGAGAGAAAACAATTCGAATGA

Human_gene-Wolf_gene
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGAGGTTTGGCTACGGAGGCTGGGTGCAGCTCCAACACCACAGCAAGGGAGAAGCAAAGTTGTTGAAAGATGGGAAAGTCTTCAGAGTGGTGCGAGAGAATTGCTGGGATCCAGAAGTTCGTATTAGAGAAATGGACCAAAAAGGAGTAACAGTGCAAGCCCTTTCCACAGTTCCTGTCATGTTTAGCTACTGGGCCAAACCTGAGGACACTTTAAACCTGTGCCAGCTTTTAAACAACGACCTTGCCAGCACCGTTGTGAGCTACCCCAGGAGGTTCGTGGGTCTGGGGACGTTGCCCATGCAGGCCCCTGAGCTGGCGGTCAAGGAGATGGAGCGCTGTGTGAAAGAGCTGGGCTTTCCCGGGGTCCAAATTGGCACCCACGTCAACGAGTGGGACCTGAACGCGCAGGAGCTCTTTCCTGTCTATGCGGCAGCCGAAAGGCTGAAGTGTTCCCTGTTCGTGCATCCCTGGGACATGCAGATGGATGGACGAATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCATAGCCATTTGCTCCATGATCATGGGTGGAGTATTTGAGAAGTTTCCCAAACTGAAAGTGTGTTTCGCACATGGTGGTGGTGCCTTCCCCTTCACAGTGGGAAGAATCTCCCATGGATTCAGCATGCGCCCAGATCTGTGTGCCCAGGACAACCCCATGAACCCGAAGAAATACCTTGGTTCCTTTTACACAGATGCTTTGGTTCATGATCCTCTGTCCCTCAAGCTGTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACCGATTACCCCTTTCCACTAGGTGAGCTGGAACCTGGGAAACTAATAGAGTCCATGGAAGAATTTGATGAAGAAACAAAGAATAAACTCAAAGCCGGCAATGCCCTGGCATTTTTGGGTCTTGAGAGAAAACAATTTGAATGA
ATGAAAATTGACATCCATAGTCATATTCTACCAAAAGAATGGCCAGATCTAAAAAAGCGATTCGGCTATGGAGGCTGGGTGCAGCTTCAACACCACAGCAAGGGAGAAGCAAAAATGTTGAAGGATGGGAAGGTCTTCAGAGTGGTCCAAGAGAACTGCTGGGATCCAGAAGTCCGTATTAGAGAAATGGACCAAACAGNNNNNnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnGCCAAACCTCAGGACACTTTAGACCTGTGCCAGCTTTTAAACAACGACTTANCTGCCACTGTTGCGAACCATCCCAGGAGGTTTGTGGGCCTGGGGACATTGCCCATGCAGGCTCCTGAGCTTGCCGTCAAGGAGATGGAGCGCTGTGTGAAGGAGCTGGGCTTTCCCGGGGTCCAGATTGGTTCCCATATCAACGAGTGGGACCTGAATGCACGGGAACTCTTCCCCTTCTACGCAGTGGCAGAAAAACTGAACTGTTCGTTATTTGTGCACCCCTGGGACATGCAAATGGATGGACGGATGGCCAAATACTGGCTCCCTTGGCTTGTAGGAATGCCAGCAGAGACCACCACAGCCATTTGTTCCATGATCATGGGAGGAGTGTTTGAGAAATTTCCTAAATTGAAAGTGTGTTTTGCACATGGAGGTGGTGCCTTCCCTTTCACAGTTGGAAGAATCTCCCATGGATTCAACATGCGTCCAGATCTGTGTGCCCAGGACAATCCAATCAACCCAAAGAAATACCTTGGTTCCTTTTACACAGACTCCTTGGTTCATGATCCTCTGGCACTCAAGCTCTTAACAGATGTCATAGGAAAGGATAAAGTCATTTTGGGAACAGATTACCCCTTTCCACTAGGAGAGCTGAAACCTGGGAAATTGATAGAGTCCATAGAAGAATTTGATGCAGAAACAAAGGATAAACTCAAAGCTGGCAATGCCCNNNNNNNNNNNNNNnnnnnnnnnnnnnnnnnnnnnnnn

Command:

./KaKs_Calculator -i Selection_test.axt -o Selection_output.txt
ADD REPLYlink modified 2.2 years ago by h.mon31k • written 2.2 years ago by adeena_hassan50

Not sure if this example makes sense .. it does not even seem to be "aligned" according to me. Also the stretches of Ns might cause some issues

ADD REPLYlink written 2.2 years ago by lieven.sterck8.9k

Were you able to make it work in the end? I removed the piece of code that ashatan.314 noted as the problem but I still had the same error. Also other sequences that were not divided by 3 didnt give me any error.

ADD REPLYlink written 23 months ago by katerinapargana0

@katerinapargana Ka/Ks calculator only worked for pairs of sequence and it worked well with the input given above. Before calculation, gaps and stop codons between compared sequences will be removed.

ADD REPLYlink written 22 months ago by adeena_hassan50
1
gravatar for Philipp Bayer
19 months ago by
Philipp Bayer6.8k
Australia/Perth/UWA
Philipp Bayer6.8k wrote:

There are two reasons for this problem:

  • your sequence alignment is based on nucleotides, not proteins, so your codons get pulled apart into gaps, so the length of your sequence is not divisible by 3

  • your gap lengths are not divisible by three.

After fighting with this error, this is the pipeline I settled on:

  1. Align proteins using MUSCLE or T-COFFEE
  2. Convert into nucleotide alignments using pal2nal, with -nogap argument to remove gaps with lengths not divisible by 3 and non-overlapping regions
  3. Convert to AXT using KaKs-Calculator's AXTConverter
  4. Run KaKs_Calculator

Thanks to ashatan.314 for the answer, the error message really hides the much more common problem of %3 != 0

ADD COMMENTlink written 19 months ago by Philipp Bayer6.8k

Hi, Philipp. Thanks for your comments, it's really helpful. I also got stuck with that. I do not have the protein align, my input file is orthology gene family. So pal2nal is not suitable for me. I try to use a perl script:

die "perl $0 <fa> <OUT>" unless ( @ARGV == 2 );
use Bio::AlignIO;
use Bio::SimpleAlign;
$in = Bio::AlignIO->new(
 -file   => "$ARGV[0]",
 -format => 'clustalw'
);
open OUT, ">$ARGV[1]" or die "$!";
while(my $aln = $in->next_aln() ){
# my @id;
# my @seq1;
# $n = 0;
# foreach $seq ($aln->each_seq()) {
#  ($id[$n], $seq1[$n]) = ( $seq->id, $seq->seq);
#     $n++;
# }
# print OUT "$id[0]&$id[1]\n$seq1[0]\n$seq1[1]\n\n";
  $seq1 = $aln->get_seq_by_pos(1);
   ($id1, $sequence1) = ( $seq1->id, $seq1->seq);
    $seq2 = $aln->get_seq_by_pos(2);
     ($id2, $sequence2) = ( $seq2->id, $seq2->seq);
      print OUT "$id1&$id2\n$sequence1\n$sequence2\n\n";
 }
      $in->close();
      close(OUT);;

However, the sequence still not equal. Could you please help to check what the problem was? Thanks.

ADD REPLYlink modified 4 months ago by Philipp Bayer6.8k • written 4 months ago by wangyu.ashley0

I edited your comment to make the code look nicer, I don't know perl much, but isn't your input already a clustalw alignment? I don't understand what your input file is, there are many ways to store an orthologous gene family?

ADD REPLYlink written 4 months ago by Philipp Bayer6.8k

Thanks Philipp. My input is orthologous after alignment and in fasta file.

ADD REPLYlink written 4 months ago by wangyu.ashley0

Actually I'm facing the same problem with KAKS, and apart from the AXT converter being buggy and remove gaps from the alignment, I think this piece of code in the kaks.cpp:

try {
    //Check whether (sequence length)/3==0
    if (str1.length()!=str1.length() || str1.length()%3!=0 || str2.length()%3!=0) {
        cout<<endl<<"Error. The size of two sequences in "<<"'"<<name<<"' is not equal."<<endl;
        throw 1;
    }

is also not doing a good job. I'm really sure that the sequences I'm providing KaKs_calculator are properly aligned (gaps are divisible by three and all the sequences have the same length), and yet the program is aborted and I get an error message saying my sequences are not the same length. I have no idea what is causing this behavior, but how I don't know C language well enough to play around with the code to try to understand the reason (maybe a hidden character, white space introduced during the parser in the c code).

Funny thing is: I'm submitting around 6k sequences to KAKS and 70% of the runs worked, the other 30% give me errors due the "size problem". But, they were all processed with the same programs/pipeline. At this point, without any success of contacting the authors, I will just comment this section in KaKs.cpp that checks the length of the sequences (because I'm totally sure they are correct) and run the software without it.

What amazes me the most is the lack of documentation for this tool, despite being widely used and cited. Even in genome papers, I could not get command lines or proper descriptions of how people submitted and processed the sequences from KaKs.

Anyways, this is my experience. Frustrating, if I might say. Best, André

ADD REPLYlink written 4 months ago by alufop0

Thanks for sharing André. I thought the size problem is occur because the sequence length is not a multiple of three (May have insertion event when align the sequences). So the ka/ks_calculator can not work well with that pair sequences.It's annoying. Good luck! Best,Yu

ADD REPLYlink written 11 weeks ago by wangyu.ashley0
0
gravatar for ashatan.314
2.2 years ago by
ashatan.3140 wrote:

I think I found the answer

There is comment in the source code file KaKs.cpp:

try {
        //Check whether (sequence length)/3==0
        if (str1.length()!=str1.length() || str1.length()%3!=0 || str2.length()%3!=0) {
            cout<<endl<<"Error. The size of two sequences in "<<"'"<<name<<"' is not equal."<<endl;
            throw 1;
        }

So if the length of your seq is not divisible by 3, the programm would throw an error and abort. Also, as it was mentioned above, the output would not make sense if you just have aligned two seqs blindly, with no correspondence to protein seq

ADD COMMENTlink modified 2.2 years ago by WouterDeCoster44k • written 2.2 years ago by ashatan.3140
1

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 2.2 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1582 users visited in the last hour