Question

Warning: MSG: Replacing one seq (BioPerl)

0

Entering edit mode

9.8 years ago

ddusan1 ▴ 50

Hey

I am running a script that gathers Ka/Ks or dN/dS ratios from an alignment fasta file. It keeps returning this warning:

I got the script from here and edited it to suit my needs. Thanks very much for the script man! https://github.com/MadsAlbertsen/miscperlscripts/blob/master/calc.dnds.pl

I understand that this generally means the headers are not unique but they are in this instance. I believe it might not be taking the entire header like I want it to.

--------------------- WARNING ---------------------
MSG: Replacing one sequence [comp17867_c0_seq1|m.61203/1-641

Here's the code: http://freetexthost.com/lnhlri3kbk

If that's a shady site I apologize, still learning.

ka.ks perl bioperl • 2.2k views

ADD COMMENT • link updated 22 months ago by Ram 43k • written 9.8 years ago by ddusan1 ▴ 50

0

Entering edit mode

To host a code example, you might better use a gist https://gist.github.com/ or pastebin http://pastebin.com/

They have syntax highlighting and other features handy for code.

ADD REPLY • link updated 22 months ago by Ram 43k • written 9.0 years ago by Michael 54k

Ram · Answer 1 · 2014-06-23

1

Entering edit mode

9.8 years ago

Neilfws 49k

Tip: when you see a warning or error, simply Google search for the exact message. Someone else has almost always seen and discussed the issue. Examples: here and here.

In this case, the problem is that the sequence identifier is not unique.

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Neilfws 49k

0

Entering edit mode

I've read those and done that.

The sequence identifiers are all unique, that's the problem. They are all long strands and every single one is unique in some aspect. I'm theorizing that it's only taking the beginning of each identifier, which is why I asked people to specifically look at that particular script.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by ddusan1 ▴ 50

2

Entering edit mode

Problem is the data, not the script. If you read those links then you should understand that the warning means the identifiers are not unique, since one is replacing another. You may think they're unique, but that's not the same thing :)

"Identifier" has a very specific meaning in a FASTA file. It is the string immediately following the ">" in the header line. If there is a space in the header then the part between ">" and the space is the identifier, the rest is the "description". Examples:

>myseq1 desc1
>myseq1 desc2

Same identifier (myseq1), different descriptions, unique header lines.

So: identifiers are not "long strands" and while the entire header line (identifier + description) may be "unique in some aspect", the identifier alone may not.

BioPerl sequence objects usually use the method display_id to get the identifier. I see that in the script; I don't see anything that would take only the beginning.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by Neilfws 49k

0

Entering edit mode

Brilliant! Thank you!

ADD REPLY • link 9.8 years ago by ddusan1 ▴ 50