How to retrieve protein sequence from gene ID and output a fasta file
1
0
Entering edit mode
20 days ago
minifoog • 0

I want to receive the protein sequence of the following gene IDs and output a fasta file with the sequences with its identifier.

handle = Entrez.esearch(db="gene",
                        term="primate[Orgn] AND TNF[Gene Name]",
                        idtype="acc",
                        retmax='50',
                        )
record = Entrez.read(handle)
idlist = record['IdList']
print(idlist)

But I am not sure where to go from here. Any help would be appreciated.

ncbi gene protein biopython entrez • 322 views
ADD COMMENT
2
Entering edit mode
19 days ago
GenoMax 100k

Using command line EntrezDirect (truncated for space) :

$ esearch -db gene -query "primate [orgn] AND TNF [gene]" | elink -target protein | efetch -format fasta > tnf.fa
$ more tnf.fa
>sp|Q19LH4.1|TNFA_CALJA RecName: Full=Tumor necrosis factor; AltName: Full=Cachectin; AltName: Full=TNF-alpha; AltName: Full=Tumor necrosis factor ligand superfamily member 2; Short=TNF-a; Contains: RecName: Full=Tumor necrosis factor, membrane form; AltName: Full=N-terminal fragment; Short=NTF; Contains: RecName: Full=Intracellular domain 1; Short=ICD1; Contains: RecName: Full=Intracellular domain 2; Short=ICD2; Contains: RecName: Full=C-domain 1; Contains: RecName: Full=C-domain 2; Contains: RecName: Full=Tumor necrosis factor, soluble form; Flags: Precursor
MSTETMIQDVELAEEALPKTRGPQGSKRRLFLSLFSFLLVAGATALFCLLHFGVIGPQKDELSKDFSLIS
PLALAVRSSSRIPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELRDNQLVVPSEGLYLVYSQVLFK
GQGCPSNFMLLTHSISRIAVSYQAKVNLLSAIKSPCQRETPQGAKTNPWYEPIYLGGVFQLEKGDRLSAE
INLPDYLDLAESGQVYFGIIGL
>sp|P48094.1|TNFA_MACMU RecName: Full=Tumor necrosis factor; AltName: Full=Cachectin; AltName: Full=TNF-alpha; AltName: Full=Tumor necrosis factor ligand superfamily member 2; Short=TNF-a; Contains: RecName: Full=Tumor necrosis factor, membrane form; AltName: Full=N-terminal fragment; Short=NTF; Contains: RecName: Full=Intracellular domain 1; Short=ICD1; Contains: RecName: Full=Intracellular domain 2; Short=ICD2; Contains: RecName: Full=C-domain 1; Contains: RecName: Full=C-domain 2; Contains: RecName: Full=Tumor necrosis factor, soluble form; Flags: Precursor
MSTESMIRDVELAEEALPRKTAGPQGSRRCWFLSLFSFLLVAGATTLFCLLHFGVIGPQREEFPKDPSLI
SPLAQAVRSSSRTPSDKPVAHVVANPQAEGQLQWLNRRANALLANGVELTDNQLVVPSEGLYLIYSQVLF
KGQGCPSNHVLLTHTISRIAVSYQTKVNLLSAIKSPCQRETPEGAEAKPWYEPIYLGGVFQLEKGDRLSA

If you want to save individual sequence in a separate file then use:

$ esearch -db gene -query "primate [orgn] AND TNF [gene]" | elink -target protein | efetch -format acc | xargs -n 1 sh -c 'efetch -db protein -id "$0" -format fasta > "$0".fa'
ADD COMMENT
0
Entering edit mode

This works thanks so much!

ADD REPLY
0
Entering edit mode

Consider accepting the answer (green check mark) then.

ADD REPLY

Login before adding your answer.

Traffic: 1804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6