Download nucleotide sequence with locus_tag
1
0
Entering edit mode
2.7 years ago
DJ_MB • 0

I have a list of locus_tag, my idea was to download them using esearch but the downloaded file is not the desired gene, instead the nucleotide sequence of the entire contig is downloaded.

in this example my gene of interest to download has 830 nc.

esearch -db nucleotide -query "JG64_RS07240" | efetch -format fasta > gen.fasta

Any idea to obtain by esearch only my sequence of interest and not all the contig?

I know I can do it manually, but I have more than 400 locus_tag that do not have gi.

Thanks for reading, I'll be attentive to any response

SEQUENCE LOCUS_TAG NUCLEOTIDE NCBI • 725 views
ADD COMMENT
2
Entering edit mode
2.7 years ago
GenoMax 141k

You can do this:

$ esearch -db nucleotide -query "JG64_RS07240" | efetch -format gene_fasta | awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' | grep JG64_RS07240 | tr "\t" "\n"
>lcl|NZ_JQQM01000039.1_gene_17 [locus_tag=JG64_RS07240] [location=19567..20385] [gbkey=Gene]
ATGAAAAAACTTTCGATTTTGGCTATCTCCGTTGCACTCTTTGCAAGCATTACCGCTTGTGGTGCTTTCGGTGGTCTGCCAAGCCTAAAAAGCTCTTTTGTTCTGAGCGAGGACACAATCCCAGGGACAAACGAAACCGTAAAAACGTTACTTCCCTACGGATCTGTGATCAACTATTACGGATACGTAAAGCCAGGACAAGCGCCGGACGGTTTAGTCGATGGAAACAAAAAAGCATACTATCTCTATGTTTGGATTCCTGCCGTAATCGCTGAAATGGGAGTTCGTATGATTTCCCCAACAGGCGAAATCGGTGAGCCAGGCGACGGAGACTTAGTAAGCGACGCTTTCAAAGCGGCTACCCCAGAAGAAAAATCAATGCCACATTGGTTTGATACTTGGATCCGTGTAGAAAGAATGTCGGCGATTATGCCTGACCAAATCGCCAAAGCTGCGAAAGCAAAACCAGTTCAAAAATTGGACGATGATGATGATGGTGACGATACTTATAAAGAAGAGAGACACAACAAGTACAACTCTCTTACTAGAATCAAGATCCCTAATCCTCCAAAATCTTTTGACGATCTGAAAAACATCGACACTAAAAAACTTTTAGTAAGAGGTCTTTACAGAATTTCTTTCACTACCTATAAACCAGGTGAAGTGAAAGGATCTTTCGTTGCATCTGTTGGTCTGCTTTTCCCACCAGGTATTCCAGGTGTGAGCCCGCTGATCCACTCAAATCCTGAAGAATTGCAAAAACAAGCTATCGCTGCTGAAGAGTCTTTGAAAAAAGCTGCTTCTGACGCGACTAAGTAA

If you have a list of those ID's then use a for loop.

Simply fetch all the gene sequences using

$ esearch -db nucleotide -query "JG64_RS07240" | efetch -format gene_fasta > all_genes.fa

$ for i in `cat ids.txt`; do awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' < all_genes.fa | grep ${i} | tr "\t" "\n" >> needed.fa; done

needed.fa will have sequences you want.

ADD COMMENT
0
Entering edit mode

It worked, thank you very much. Greetings from Colombia.

ADD REPLY

Login before adding your answer.

Traffic: 2719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6