Question

Large GI number to fasta

0

Entering edit mode

3.8 years ago

zion22 ▴ 70

Hi, sorry to bother you, I have a query and I already checked the post history on biostars and I really couldn't find what I was looking for. Does anyone know of a simple, up-to-date way to download nucleotide sequences from the GI number list? I have a list of specific GI numbers but can't find an efficient way to download in fasta file in ncbi portal. Thank you and I appreciate your attention.

ncbi sequence gene • 910 views

ADD COMMENT • link updated 3.8 years ago by GenoMax 142k • written 3.8 years ago by zion22 ▴ 70

score 1 · Answer 1 · 2020-07-12

1

Entering edit mode

3.8 years ago

GenoMax 142k

Using Entrezdirect:

$ more gi.txt
169403946
1818269734

$ epost -db nuccore -input gi.txt -format uid | efetch -format fasta

Will generate these two sequences (only fasta headers shown here).

$ epost -db nuccore -input tt -format uid | efetch -format fasta | grep "^>"

>NM_001115114.1 Danio rerio glyceraldehyde-3-phosphate dehydrogenase (gapdh), mRNA
>MN206740.1 Colletotrichum sp. isolate Cer015 GAPDH (gapdh) gene, partial cds

ADD COMMENT • link 3.8 years ago by GenoMax 142k

0

Entering edit mode

Hi, thank you very much, I have another question, how can I download specific genes from a genome? I explain you, if I have the following genome CP024842.1 and I want to download the 16s, efl, lux genes from that genome, how would command line be? Thanks again

ADD REPLY • link 3.8 years ago by zion22 ▴ 70

0

Entering edit mode

Only if these genes are annotated in the genome assembly. I don't see efl or lux in the protein table for this organism.

ADD REPLY • link 3.8 years ago by GenoMax 142k

0

Entering edit mode

yes, i know, it was only an example, ej. for qseC, acnA and 16s genes. how would command line be?

ADD REPLY • link 3.8 years ago by zion22 ▴ 70

0

Entering edit mode

You can get start stops and then download the sequence using a command like following:

esearch -db gene -query "qseC [GENE]" | efetch -format docsum | xtract -pattern GenomicInfoType -element ChrAccVer ChrStart ChrStop

this will generate (truncated)

NC_000913.3     3170483 3171832
NC_011852.1     2012753 2014153
NC_020796.1     470368  469004
NC_013508.1     492510  491146
NC_020418.1     2523868 2525214

Unfortunately this gene has not been annotated in your genome of interest

esearch -db gene -query "Pectobacterium versatile [ORGN] AND qseC [GENE]" | efetch -format docsum | xtract -pattern GenomicInfoType -element ChrAccVer ChrStart ChrStop

ADD REPLY • link 3.8 years ago by GenoMax 142k