E-utilities for obtain gene sequences from the gene database
1
2
Entering edit mode
6.1 years ago
dllopezr ▴ 120

Hi everyone

I need to download all gene sequences from a query gene in gene ncbi database through e-utilites in linux command line. The next command (adapted from ncbi example) works for gene to protein:

esearch -db gene -query "nifH AND Sinorhizobium meliloti 1021 [orgn]" | elink -target protein -name gene_protein_refseq | efetch -format fasta

But when I try this:

esearch -db gene -query "nifH AND Sinorhizobium meliloti 1021 [orgn]" | elink -target nuccore -name gene_nuccore_refseqgene | efetch -format fasta

I obtain this error msg:

QueryKey value not found in fetch input

As a note: ncbi examples of how to do this search don't exist, so I am question myself if it is possible to conect gene db with nucleotide db, of retrieve gene sequences from gene database as well

Thank you for your suggestions

gene ncbi nucleotide • 1.4k views
ADD COMMENT
3
Entering edit mode
6.1 years ago
GenoMax 141k

Extending query you were trying:

esearch -db gene -query "nifH AND Sinorhizobium meliloti 1021 [orgn]" | efetch -format docsum | xtract -pattern GenomicInfoType -element ChrAccVer -element ChrStart -element ChrStop |xargs -n 3 sh -c 'efetch -db nuccore -id "$0" -seq_start "$1" -seq_stop "$2" -format fasta'

Should get you

>NC_003037.1:453555-454448 Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence
GATGGCAGCTCTGCGTCAGATCGCGTTCTACGGTAAGGGGGGTATCGGCAAGTCCACGACCTCCCAAAAT
ACACTCGCCGCGCTTGTCGACCTGGGGCAAAAGATCCTTATTGTCGGCTGCGATCCGAAAGCGGACTCCA
CGCGCCTCATCCTGAACGCAAAGGCACAGGACACCGTACTGCATCTTGCGGCAACCGAAGGTTCGGTCGA
AGACCTCGAGCTCGAGGACGTGCTCAAAGTGGGTTACAGAGGCATCAAGTGCGTGGAGTCCGGTGGCCCA
GAGCCGGGCGTCGGCTGCGCCGGACGCGGCGTTATCACCTCGATCAACTTCCTGGAAGAGAACGGCGCTT
ACAACGATGTCGATTACGTCTCATACGACGTGCTAGGGGACGTAGTATGCGGCGGCTTTGCGATGCCTAT
TCGCGAAAACAAGGCTCAGGAAATCTACATCGTCATGTCCGGTGAGATGATGGCGCTCTATGCCGCCAAC
AACATCGCGAAGGGTATCCTGAAGTACGCCCATGCGGGCGGCGTGCGGCTGGGGGGGTTGATTTGCAACG
AGCGCCAGACCGATCGGGAGCTCGACCTCGCCGAGGCACTTGCCGCCCGCCTCAATTCCAAGCTCATCCA
CTTCGTGCCGCGCGACAATATCGTTCAGCACGCAGAGCTCAGAAAGATGACAGTGATCCAATATGCGCCG
AACTCTAAGCAAGCCGGGGAATATCGCGCCCTGGCTGAAAAGATCCATGCAAATTCCGGCCGAGGCACCG
TCCCTACACCGATCACTATGGAGGAACTGGAGGACATGCTGCTCGACTTTGGAATCATGAAGAGCGACGA
GCAGATGCTTGCCGAACTCCACGCCAAGGAAGCCAAGGTAATAGCCCCCCACTG
ADD COMMENT

Login before adding your answer.

Traffic: 3156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6