Question

How to download multiple fasta files from NCBI in linux command line?

1

Entering edit mode

3.2 years ago

arriyaz.nstu ▴ 30

Hi, I have a text file that contains a list of accession numbers for multiple nucleotide sequences like below:

NM_001354644.1
NM_001354643.1
NM_007288.3
NM_001300741.2

I want to use this list containing text file as input and download all fasta file altogether by using the Linux command line. The downloaded files need to be separate files (not in a single multifasta file).

How I can accomplish this??? Thanks in advance.

fasta linux command line NCBI • 4.9k views

ADD COMMENT • link updated 3.2 years ago by Mensur Dlakic ★ 27k • written 3.2 years ago by arriyaz.nstu ▴ 30

score 4 · Answer 1 · 2021-02-22

Using Entrezdirect:

$ more id
NM_001354644.1
NM_001354643.1
NM_007288.3
NM_001300741.2

Option 1:

$ epost -db nuccore -input id -format acc | efetch -format fasta > seq.fa

NOTE: You can split multi-fasta output file (seq.fa) into individual files using faSplit utility from Jim Kent using directions here: C: How to split fasta by '>' into a file each containing one sequence, and have the

Option 2:

If you don't want to split the large file you can download as individual files using following method:

$ for i in `cat id`; do efetch -db nuccore -id ${i} -format fasta > ${i}.fa ; done

Just to show the fasta headers of files recovered:

$ epost -db nuccore -input id -format acc | efetch -format fasta | grep ">"
>NM_001300741.2 Homo sapiens nudix hydrolase 12 (NUDT12), transcript variant 2, mRNA
>NM_001354644.1 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 5, mRNA
>NM_001354643.1 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 4, mRNA
>NM_007288.3 Homo sapiens membrane metalloendopeptidase (MME), transcript variant 2a, mRNA

score 0 · Answer 2 · 2021-02-22

0

Entering edit mode

3.2 years ago

Mensur Dlakic ★ 27k

Did it ever occur you to do the same thing as already suggested in one of your previously answered posts? This is essentially the same problem except that you want the files saved individually. It always heartens me to see when posters learn something from previous posts and try to apply it to new problems.

Absent that, this will do the trick (assuming your ID numbers are saved in a file named ids:

cat ids | xargs -i sh -c "esearch -db nuccore -query {} | efetch -format fasta > {}.fna"