Primers Design. How To Download Each Nucleotide Sequence In Fasta Format
1
0
Entering edit mode
11.1 years ago
valdeanda ▴ 30

Hello everybody:

I need to design primers for 31 protein encoding phylogenetic marker genes (dnaG, frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS, rplT, rpmA, rpoB, rpsB, rpsC, rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB, and tsf), because i'm going to tested in silico in four metagenomes.

I found a software to do it: http://floresta.eead.csic.es/primers4clades/#0, but to use it, i need all the gene sequences from each of the 31 protein marker genes.

For example, if i search the first gene: dnaG, in NCBI, in the gene database, it retrieves 5647 sequences, which i want to download in fasta format, so i can use it in the sofware of primers desing.

Can anyone explain me PLEASE!! how to download ALL the GENE sequences OF EACH GENE in fasta format from NCBI?**? THIS IS URGENT!! PLEASE!!!

fasta primer • 3.5k views
ADD COMMENT
0
Entering edit mode

is this for a class assignment?

ADD REPLY
2
Entering edit mode
11.1 years ago
Rm 8.3k

Follow these posts: Easiest way to get mRNA Refseq ACC related to an Entrez Gene Id using NCBI EUtility programs ; Get FASTA file with protein sequences given Entrez Gene IDs ; Pierre gave good solutions related to this(create jeter.xsl file) ; I have combined few of his scripts to achieve this; and grep for what ever type (genomic; peptide, mRNA etc..) you want.

for dnaG:

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=dnaG\[gene+name\]&rettype=acc&retmode=text&retmax=1500" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1| while read G; do xsltproc --novalid jeter.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=$G&retmode=xml" | grep -A1 "peptide" | grep "acn" | cut -d":" -f2; done > peptide.list.txt

then submit the list to ncbi batch to retrieve the sequences in fasta format

or say for getting peptide sequences

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=dnaG\[gene+name\]&rettype=acc&retmode=text&retmax=1500" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1| while read G; do xsltproc --novalid jeter.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=$G&retmode=xml" | grep -A1 "peptide" | grep "acn" | cut -d":" -f2 | while read S ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${S}&retmode=text&rettype=fasta" ; done; done > peptide.sequences.fasta

or for getting mRNA sequences

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=dnaG\[gene+name\]&rettype=acc&retmode=text&retmax=1500" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1| while read G; do xsltproc --novalid jeter.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=$G&retmode=xml" | grep -A1 "mRNA" | grep "acn" | cut -d":" -f2 |  while read S ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${S}&retmode=text&rettype=fasta" ; done; done >mRNA.sequences.fasta

for all peptides in a loop:

for I in dnaG frr infC nusA pgk pyrG rplA rplB rplC rplD rplE rplF rplK rplL rplM rplN rplP rplS rplT rpmA rpoB rpsB rpsC rpsE rpsI rpsJ rpsK rpsM rpsS smpB tsf ; do echo $I ; curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=$I\[gene+name\]&rettype=acc&retmode=text&retmax=1500" | grep "<Id>" | cut -d '>' -f 2 | cut -d '<' -f 1| while read G; do xsltproc --novalid jeter.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=$G&retmode=xml" | grep -A1 "peptide" | grep "acn" | cut -d":" -f2 |  while read S ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${S}&retmode=text&rettype=fasta" >>$I.peptide.seq.fasta ; done; done ; done
ADD COMMENT

Login before adding your answer.

Traffic: 2607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6