Large GI number to fasta
1
0
Entering edit mode
3.8 years ago
zion22 ▴ 70

Hi, sorry to bother you, I have a query and I already checked the post history on biostars and I really couldn't find what I was looking for. Does anyone know of a simple, up-to-date way to download nucleotide sequences from the GI number list? I have a list of specific GI numbers but can't find an efficient way to download in fasta file in ncbi portal. Thank you and I appreciate your attention.

ncbi sequence gene • 910 views
ADD COMMENT
1
Entering edit mode
3.8 years ago
GenoMax 142k

Using Entrezdirect:

$ more gi.txt
169403946
1818269734

$ epost -db nuccore -input gi.txt -format uid | efetch -format fasta

Will generate these two sequences (only fasta headers shown here).

$ epost -db nuccore -input tt -format uid | efetch -format fasta | grep "^>"

>NM_001115114.1 Danio rerio glyceraldehyde-3-phosphate dehydrogenase (gapdh), mRNA
>MN206740.1 Colletotrichum sp. isolate Cer015 GAPDH (gapdh) gene, partial cds
ADD COMMENT
0
Entering edit mode

Hi, thank you very much, I have another question, how can I download specific genes from a genome? I explain you, if I have the following genome CP024842.1 and I want to download the 16s, efl, lux genes from that genome, how would command line be? Thanks again

ADD REPLY
0
Entering edit mode

Only if these genes are annotated in the genome assembly. I don't see efl or lux in the protein table for this organism.

ADD REPLY
0
Entering edit mode

yes, i know, it was only an example, ej. for qseC, acnA and 16s genes. how would command line be?

ADD REPLY
0
Entering edit mode

You can get start stops and then download the sequence using a command like following:

esearch -db gene -query "qseC [GENE]" | efetch -format docsum | xtract -pattern GenomicInfoType -element ChrAccVer ChrStart ChrStop

this will generate (truncated)

NC_000913.3     3170483 3171832
NC_011852.1     2012753 2014153
NC_020796.1     470368  469004
NC_013508.1     492510  491146
NC_020418.1     2523868 2525214

Unfortunately this gene has not been annotated in your genome of interest

esearch -db gene -query "Pectobacterium versatile [ORGN] AND qseC [GENE]" | efetch -format docsum | xtract -pattern GenomicInfoType -element ChrAccVer ChrStart ChrStop
ADD REPLY

Login before adding your answer.

Traffic: 1432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6