Question: Error extracting faa sequences from multifasta using faidx
0
gravatar for dago
3.3 years ago by
dago2.5k
Germany
dago2.5k wrote:

I have a multifasta faa file from which I want to extract some seqeunces using Ids.

Multifasta example:

>NITMOv2_RS22300
MAKTVAVVIREDPRRTHRPVEALRIALGLVAGNHATTVVLLNEAARLLSEDTDDVVDVEI
LEKYLPSIQQLEVPFVLPEFIDRSGVRTDFAVRYESDDTIRRLLQSMDRTLVF
>NITMOv2_RS22305
MSLSSSVYLIRKSAAALSPTLYVSGDSDWVVVEIGEDKRSSDYRELLELVLHAEKVITL

 

Ids example:

NITINOP_v2_3300
NITINOP_v2_3307

I usually do this using the following command

xargs faidx -d "" MULTIFASTA < Ids

It always worked fine, but with some new files it started to give me the following error I cannot understand:

Traceback (most recent call last):
  File "/usr/local/bin/faidx", line 9, in <module>
    load_entry_point('pyfaidx==0.3.4', 'console_scripts', 'faidx')()
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 132, in main
    write_sequence(args)
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/cli.py", line 33, in write_sequence
    fasta = Fasta(args.fasta, default_seq=args.default_seq, strict_bounds=not args.lazy, split_char=args.delimiter)
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/__init__.py", line 527, in __init__
    read_ahead=read_ahead, mutable=mutable, split_char=split_char)
  File "/usr/local/lib/python2.7/dist-packages/pyfaidx/__init__.py", line 218, in __init__
    raise FastaIndexingError(e)

 

Any suggestion, what am I missing here?

 

 

 

software error genome • 1.0k views
ADD COMMENTlink modified 3.3 years ago by geek_y9.3k • written 3.3 years ago by dago2.5k
1

Alternatively you can use faSomeRecords,

./faSomeRecords input.faa ids.txt output.faa

Here is how A: perl code to extract sequences from multi-line fasta works on all test files but

 

 

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by venu6.0k

Thanks very much. This is another good option I guess.

However, whenever I try to run it I get the following error:

Unrecognized character \x7F; marked by <-- HERE after <-- HERE near column 1 at faSomeRecords line 1.

I try to look into the code but I cannot even open it. Any suggestion?

 

EDIT

I did not make it executable...sorry! Thanks it worked!

 

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by dago2.5k
0
gravatar for geek_y
3.3 years ago by
geek_y9.3k
Barcelona/CRG/London/Imperial
geek_y9.3k wrote:

Try with proper IDs. 

NITINOP_v2_3300
NITINOP_v2_3307

Both are not present in your example and it works fine if I use NITMOv2_RS22300. So the problem might be with the match between Ids between fasta and Ids.txt

ADD COMMENTlink written 3.3 years ago by geek_y9.3k

Thanks. I just reported an example. I checked and my Ids are present in the multifasta I am using

 

ADD REPLYlink written 3.3 years ago by dago2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1141 users visited in the last hour