Samtools extraction output is empty
1
0
Entering edit mode
5.5 years ago
spidcock01 • 0

Sorry if similar questions have been answered already - I had a look and can't see anything that suits my problem exactly.

I'm trying to use Samtools to extract Fasta sequences from a database file using a list. (Sorry if my terminology is dreadful, I'm new to computational biology).

So I have my database file as a multi-FASTA file, and my list purely as a list of gene headers in a text file, e.g.

ND3005_accessory_2122_length_3432

etc. These do exactly match the headers in the database, with the exception of not having a > at the start of each sequence, but I've tried with these as well.

The only output that I get is a file with the headers in, e.g.

>ND3005_accessory_2122_length_3432

and an example of the terminal output is the following:

not found in FASTA file, returning empty sequence0_length_2460

Here are the commands I've been using:

$ samtools faidx Gene_db.fasta
$ xargs samtools faidx Gene_db.fasta < GH2_list.txt > GH2_output.fasta

Any thoughts would be appreciated. Thanks!

samtools gene extraction fasta alignment • 2.2k views
ADD COMMENT
0
Entering edit mode

Hello spidcock01 ,

I'm not familiar with xargs. I would use parallel like this:

$ cat GH2_list.txt|parallel -k 'samtools faidx Gene_db.fasta {}' > GH2_output.fasta

And please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

fin swimmer

ADD REPLY
0
Entering edit mode

Unfortunately this still produces a file with just the headers. This means that the problem is probably with my list file, right?

Thank you for advice regarding formatting!

ADD REPLY
2
Entering edit mode

If you get just the headers, I would suspect that something with your fasta file is wrong.

What's the output of:

$ grep -A 10 "ND3005_accessory_2122_length_3432" Gene_db.fasta

Also you should check if the line ending terminators are correct:

$ file Gene_db.fasta
Gene_db.fasta: ASCII text

If the output is Gene_db.fasta: ASCII text, with CRLF line terminators you should install dos2unix and run dos2unix Gene_db.fasta.

fin swimmer

ADD REPLY
0
Entering edit mode

Did you try my solution bellow?

ADD REPLY
0
Entering edit mode

Thanks for your help! You were right in that it was the line ending terminators.

Thank you again!

ADD REPLY
0
Entering edit mode
5.5 years ago
h.mon 35k

samtools faidx doesn't take arguments from stdin, so you have to use a variable:

samtools faidx Gene_db.fasta $(cat GH2_list.txt) > GH2_output.fasta
ADD COMMENT

Login before adding your answer.

Traffic: 1336 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6