Question: How to download multiple fasta files from NCBI in linux command line?
1
gravatar for arriyaz.nstu
12 days ago by
arriyaz.nstu10
arriyaz.nstu10 wrote:

Hi, I have a text file that contains a list of accession numbers for multiple nucleotide sequences like below:

NM_001354644.1
NM_001354643.1
NM_007288.3
NM_001300741.2

I want to use this list containing text file as input and download all fasta file altogether by using the Linux command line. The downloaded files need to be separate files (not in a single multifasta file).

How I can accomplish this??? Thanks in advance.

linux ncbi command line fasta • 134 views
ADD COMMENTlink modified 12 days ago by Mensur Dlakic9.1k • written 12 days ago by arriyaz.nstu10
3
gravatar for GenoMax
12 days ago by
GenoMax96k
United States
GenoMax96k wrote:

Using Entrezdirect:

$ more id
NM_001354644.1
NM_001354643.1
NM_007288.3
NM_001300741.2

Option 1:

$ epost -db nuccore -input id -format acc | efetch -format fasta > seq.fa

NOTE: You can split multi-fasta output file (seq.fa) into individual files using faSplit utility from Jim Kent using directions here: C: How to split fasta by '>' into a file each containing one sequence, and have the

Option 2:

If you don't want to split the large file you can download as individual files using following method:

$ for i in `cat id`; do efetch -db nuccore -id ${i} -format fasta > ${i}.fa ; done

Just to show the fasta headers of files recovered:

$ epost -db nuccore -input id -format acc | efetch -format fasta | grep ">"
>NM_001300741.2 Homo sapiens nudix hydrolase 12 (NUDT12), transcript variant 2, mRNA
>NM_001354644.1 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 5, mRNA
>NM_001354643.1 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 4, mRNA
>NM_007288.3 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 2a, mRNA
ADD COMMENTlink modified 11 days ago • written 12 days ago by GenoMax96k

Thank you for detailed explanation.

ADD REPLYlink written 11 days ago by arriyaz.nstu10
0
gravatar for Mensur Dlakic
12 days ago by
Mensur Dlakic9.1k
USA
Mensur Dlakic9.1k wrote:

Did it ever occur you to do the same thing as already suggested in one of your previously answered posts? This is essentially the same problem except that you want the files saved individually. It always heartens me to see when posters learn something from previous posts and try to apply it to new problems.

Absent that, this will do the trick (assuming your ID numbers are saved in a file named ids:

cat ids | xargs -i sh -c "esearch -db nuccore -query {} | efetch -format fasta > {}.fna"
ADD COMMENTlink written 12 days ago by Mensur Dlakic9.1k

Thank you very much. It was really helpful.

ADD REPLYlink written 11 days ago by arriyaz.nstu10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1275 users visited in the last hour
_