Efetch extracting large fasta data from positon
2
0
Entering edit mode
3.9 years ago

Hello, I am new in bioinformatics and I need to run a little command line to help me to extract fasta sequences. I download Edirect in my Ubuntu, and I read a lot of Efread command.

I try run this line:

efetch -db nuccore -format fasta -id NC_035437.1 -chr_start 214621161 -chr_stop 214618066

And it works, but I need around 200 of these lines, How I do it?

My table of input data is something like that:

NC_035437.1 214621161   214618066
NC_035437.1 209015121   209019563
NC_035437.1 208791830   208794856
NC_035437.1 194797143   194795212
NC_035437.1 187148585   187150444
NC_035437.1 167068722   167071843
NC_035433.1 131712739   131714461

Thank ;)

Efetch • 687 views
ADD COMMENT
1
Entering edit mode
3.9 years ago
cat input.txt | while read A B E; do efetch -db nuccore -format fasta -id "${A}" -chr_start "${B}" -chr_stop "${E}" ; done
ADD COMMENT
2
Entering edit mode
3.9 years ago
GenoMax 141k

Try this (assumes your input file of co-ordinates is space separated) :

$ awk -F ' ' '{print $1,$2,$3}' input_coord_file | xargs -n 3 sh -c 'efetch -db nuccore -format fasta -id $0 -chr_start $1 -chr_stop $2'
ADD COMMENT
0
Entering edit mode

Thank you so much!!! It works ;)

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6