Entering edit mode
5.1 years ago
medamato
•
0
Dear group, I am relatively new to these resources. I would like to download a batch of sequences from the mitochondrial reference sequences in ncbi. For instance. would like to get all 16S rRNA sequences from Felidae (https://www.ncbi.nlm.nih.gov/genome/browse#!/organelles/Felidae) in fasta format. How do I extract and download this information? thnx. eugenia.
thank you genomax! it worked. slowly, got there. :)
Please upvote and accept the answer it was helpful. Thanks!
Dear genomax I am asking for help again, I tried my self but my understanding of the scrip is not too good and could not find a solution in the entrez programming manual .
I have to keep analyzing different taxa and different genes, what terms of the code do I need to modify for this ? (sorry don't want to keep bothering you)
e.g. I tried now 16S and Carnivora (encompassing Felidae, dogs, ferrets, seals etc) , proceeding as above , modifying the output name, but doesn't not work. How do I change taxa of interest and gene of interest from full mitichondrial genomes ? thanks millions eugenia
There are 198 results for 16S/Carnivora and out of those 66 seem to have 16S annotations in 2 separate fields. Following should get you those entries.
Dear genomax, thanks for your reply. I run the script and only gets a list of this sort in the screen:
NC_011124.1 1103 2677 NC_035814.1 1102 2669 NC_008417.1 1101 2677 NC_008420.1 1102 2679 ..etc
.. but i need a fasta file like the one that worked for felidae.
I appreciate your help. best regards. eugenia.
You need to replace the
awk
part before first|
in command line in my answer with this code. Try this:I have a related query, I am trying now to extract nuclear genes.
e.g. in genes I am searched using the terms: (18S ribosomal RNA) AND Aves
I recover a list of 71 entries. I use a similar code as above awk -F '\t' '{if ($12 ~/NC/ && $8 ~/(18S ribosomal RNA)/) print $12,$13,$14}' info18Aves.txt | xargs -n 3 sh -c 'efetch -db nuccore -id $0 -seq_start $1 -seq_stop $2 -format fasta' > Aves18.fa
but I can recover only the sequences with chromosomal location although all others are annotated too. How can I recover all 71 sequences ? thnx eugenia
Only 23 of the entries that have 18S have chromosomal locations assigned. You can get those entries by
thnx genomax. I tried your code followed by how to wrote the fasta:
awk -F '\t' '{if ($0 ~/NC/ && $0 ~/18S/) print $12,$13,$14}' gene_result.txt | xargs -n 3 sh -c 'efetch -db genome -id $0 -seq_start $1 -seq_stop $2 -format fasta' > Aves18b.fa
I get a very large message starting with
400 Bad Request No do_post output returned from 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=genome&id=NC_044276.1&rettype =fasta&retmode=text&seq_start=23750946&seq_stop=23752773&edirect_os=MSWin32&edirect=13.6&tool=edirect&email=NBDLN587A3839+A dmin@NBDLN587A3839.ad.uwc.ac.za'
.... its much larger, further below it continues
I appreciate your help
You need to use your own file name in place of
gene_result.txt
. That is the name I had saved my search for18S rRNA and aves
with.Thanks genomax I did so, and did not work. In order to prevent typing mistakes I saved the file with your name and tried again. still get the same error messages
Not sure what the problem is on your end. Works for me. I am only showing some fasta headers.
in the line of error that I transcribed there is something very strange,
*+A dmin@NBDLN587A3839.ad.uwc.ac.za'*
uwc.ac.za is the end of my university address , however I logged in ncbi with my gmail personal account.
is it possible that this is some kind of issue generated by the university server ? thnx eugenia
My command lines work on linux. Are you now using windows? Windows Subsystem for Linux on windows? Was there some change with local firewall? Is it preventing your downloads?
I am using Cygwin I tried to install Ubuntu but something prevented from doing so, windows updates only if we are on campus. We are still on lockdown, working from home. maybe something is missing in Windows updates ? I will consult with the university . thnx a lot for all your help. I will let you know
dear genomax, I have no idea what happened, restarted the computer and the code is working now but only recovers 6 sequences, not 23 (out of the 71 Aves 18S)
awk -F '\t' '{if ($0 ~/NC/ && $0 ~/18S/) print $12,$13,$14}' gene_result.txt | xargs -n 3 sh -c 'efetch -db nuccore -id $0 -seq_start $1 -seq_stop $2 -format fasta' > Aves18cc.fa
thnx. eugenia