Hello, I have a list of ~1000 human gene names (currently in csv format) and I would like to download their protein amino acid sequence in fasta format, in one single file. I would like to have all protein isoforms of that gene included in my fasta file.
I know that NCBI has the sequence data but I'm not sure how to download them in one go?
Seems like Uniport could do the work, but requires PID rather than gene name... and I'm not sure how to do mass PID retrieval by gene name.
Currently trying: Download full protein sequence from
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz, then filter those with
OS=Homo sapiens and
GN=gene name in my excel... Wonder if there's faster ways?