How to get nuclear and mitochondrial gene sequences ?
1
0
Entering edit mode
4.8 years ago
kA • 0

Hi,

As part of a script that i'm writing, I need to get the sequence of any desired mitochondrial and nuclear gene of a species from genbank. I'm not sure how to optimise my search and which NCBI databases (genome, gene, nucleotide) would be the best to ensure that I am indeed downloading the correct sequences. I have seen mitochondrial filters while searching, but still not sure if that works well for all mitochondrial gene sequences. For example, I'd like to try and download the COX1 gene of C. elegans. Searching for this on the nucleotide database gives me mitochondrial genomes with additional genes, rather than the single gene. There's a link to the gene sequence but it goes to the gene database. So i'm not sure how my script would download the right sequence.

mitochondrial genes sequence ncbi • 1.4k views
ADD COMMENT
2
Entering edit mode

This question is phrased too vaguely to know what your problem actually is. The vast majority of genes are nuclear so why would you have trouble with those? Are you talking about NUMTs? How about you rephrase your question to fetching a specific gene and we'll show you how to do it?

ADD REPLY
0
Entering edit mode

Hi, I've added a specific example if that helps.

ADD REPLY
1
Entering edit mode

Have you tried the C. elegans genome in Ensembl?

ADD REPLY
0
Entering edit mode

No I haven't. Although I should specify that I am trying to write a bash script program which would work for eukaryotes in general and not just C. elegans specifically. If I can still do that for other organisms, would my bash script be able to download the sequences from the database ?

ADD REPLY
5
Entering edit mode
4.8 years ago
GenoMax 141k

Using Entrezdirect (example data truncated for space) :

Genes in Mitochondrion

$ esearch -db nuccore -query "Caenorhabditis elegans [ORGN] AND gene_in_mitochondrion[PROP]" | esummary | xtract -pattern DocumentSummary -element Caption,Title
CP038193    Caenorhabditis elegans strain CB4856 mitochondrion, complete genome
MF167645    Caenorhabditis elegans cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial
JF896456    Caenorhabditis elegans strain N2 mitochondrion, partial genome
JF896455    Caenorhabditis elegans strain CB4856 mitochondrion, partial genome
AY268112    Caenorhabditis elegans cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial gene for mitochondrial product

And for those in genome:

$ esearch -db nuccore -query "Caenorhabditis elegans [ORGN] AND gene_in_genomic [PROP]" | esummary | xtract -pattern DocumentSummary -element Caption,Title
MF767409    Caenorhabditis elegans strain Bristol N2 acetylcholine receptor 8 mRNA, complete cds
MF282010    Caenorhabditis elegans NMDA receptor auxiliary protein (nrap-1) mRNA, complete cds
AY532255    Caenorhabditis elegans clone 2-M13R mec-1 mRNA, sequence
AY532254    Caenorhabditis elegans clone 4-M13F mec-1 mRNA, sequence
AY532262    Caenorhabditis elegans clone 3-M13F mec-1 mRNA, sequence
AY532261    Caenorhabditis elegans clone 5-M13R mec-1 mRNA, sequence

As for the example gene you posted that seems to be present on on mitochondrial genome:

$ esearch -db nuccore -query "Caenorhabditis elegans [ORGN] AND COX1 [GENE]" | esummary | xtract -pattern DocumentSummary -element Caption,Title
JF896456    Caenorhabditis elegans strain N2 mitochondrion, partial genome
JF896455    Caenorhabditis elegans strain CB4856 mitochondrion, partial genome
EU407805    Caenorhabditis elegans strain PS2025 mitochondrion, partial genome
EU407804    Caenorhabditis elegans strain JU258 mitochondrion, partial genome
NC_001328   Caenorhabditis elegans mitochondrion, complete genome

Even in the gene database:

$ esearch -db gene -query "Caenorhabditis elegans [ORGN] AND COX1 [GENE]" | esummary | xtract -pattern DocumentSummary -element Id,Name,Chromosome,ScientificName
2565700 COX1    MT  Caenorhabditis elegans

To get all genes and their chromosomal locations:

$ esearch -db gene -query "Caenorhabditis elegans [ORGN]" | esummary | xtract -pattern DocumentSummary -element Id,Name,Chromosome,ScientificName | head -10
172981  daf-16  I   Caenorhabditis elegans
175410  daf-2   III Caenorhabditis elegans
177343  skn-1   IV  Caenorhabditis elegans
266860  lin-4   II  Caenorhabditis elegans
266952  let-7   X   Caenorhabditis elegans
178272  ced-3   IV  Caenorhabditis elegans
181263  daf-12  X   Caenorhabditis elegans
172616  cep-1   I   Caenorhabditis elegans
180359  hif-1   V   Caenorhabditis elegans
175643  ced-4   III Caenorhabditis elegans
ADD COMMENT
0
Entering edit mode

Thanks for the suggestion. Would I be able to do this without installing Entrezdirect ? I would like to use the esearch/efetch utilities as is, with the wget command in bash if possible ? Currently I have something like this:

wget "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&.....
ADD REPLY
0
Entering edit mode

Likely. Figure out the exact URL you will need to make using the filter above.

ADD REPLY

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6