Question

How To Retrieve Fasta Sequences By Gene Symbols In Multiple Query?

0

Entering edit mode

11.2 years ago

giorgiocasaburi85 • 0

Hi all,

I have a set of 500 orf:

NMB0001 NMB0002 NMB0010 ... ..

I have to obtain the corresponding fasta nucleotide sequences of those orf. Doing it manually wouold be too much time and energy spending. Is there any fast and automatic way to obtain the fasta sequences given the Symbol as ID's?

Thanks a lot in advance for your help

Giorgio

data • 12k views

ADD COMMENT • link updated 7.1 years ago by Maximilian Haeussler ★ 1.6k • written 11.2 years ago by giorgiocasaburi85 • 0

1

Entering edit mode

programmatically (e.g. BioPerl/Biomart) or with a user interface (e.g. UCSC table browser/Biomart)?

ADD REPLY • link 11.2 years ago by Irsan ★ 7.8k

0

Entering edit mode

I know it could be done programmatically..is for that reason I wrote the post since I don't know how to do it! :/

ADD REPLY • link 11.2 years ago by giorgiocasaburi85 • 0

score 8 · Answer 1 · 2013-01-25

This can be done in two steps:

At the NCBI, see the complete chromosome sequence of Neisseria meningitidis: http://www.ncbi.nlm.nih.gov/nuccore/NC_003112.2
Send -> File -> Coding Sequences

You will get the multiple FASTA file containing coding sequences of all Neisseria genes (NB0001, NB0002, etc.). See below:

>lcl|NC_003112.2_cdsid_NP_273067.1 [gene=NMB0001] [protein=acetyltransferase] [protein_id=NP_273067.1] [location=complement(7..498)]
ATGAATTCCCTCTTTGTGGACAATACTGTTTTCATTACACGGCTGAAAGCCGGGCATATCGGCAGGTTGG
TTCAGGCGTTGTTTGAGGAGTGGCACGGATTTGAACCGTGGTCTTCTGTGGATAAGATTCATGCCTATTA
CGGCAGGTGTTTGAAGGATGACGAACTGCCGCTGGCATTTGCGGCTGTGGATGATTCCGGAATCCTGTTG
GGTTCGGCTGCGGTCAAGCGGCATGATATGGAAAGTTTTCCACGGTATGAATATTGGTTGGGGGATGTCT
TTGTTTTACCTGAATATCGCGGAAAAGGCATTGGCAGGAGGCTGGTCGCCCACTGCATAGGCGCAGCGCG
TTCGCTGGGGATAAAGTTCTTGTATCTTTATACGCCTGATGTGCAAATATTTTATGAATCATTCGGCTGG
GTGGTTGTCGGGCGACATTTCCATAACGGTGAATGGGTTACGGTTATGCGTTTGGATGTGGATAAGGTTT
AA
>lcl|NC_003112.2_cdsid_NP_273068.1 [gene=NMB0002] [protein=hypothetical protein] [protein_id=NP_273068.1] [location=complement(502..897)]
ATGCCGTCTGATGTCGGAATACGGCTTCAGACAGCATTTAAATGGAAGTTAAAAATGAAAAAAATATTTT
ATTTTCTGATGGTTGTTTTTTCTACAAGCGTATGGGCAGGGGATGCTGAAGACAATCTGCTCAGCATCCA
ATCCGGTTACCGCGCCTTATTGCAAAAGCAAAACAATCTGGACGGAAAAATCATCGGGATGCAGTCGGAT
TTGGAAGATGCGCGCCGGCGTTTGCAGACGGCTCAGGCAGACATCGCCCGTTTGGAAGCGGAAATTCCTG
CAGCAATGGCGCAAAAAGCCCGGCAGGCTGAAGATTTGAGGCAAATCGGAGTGCGTTTGGACCATGCTTG
GAATGCGGTTTACGGCGCAGGGGGAACGAAGGCGTCGGGGAATTGA
[...]

score 4 · Answer 2 · 2013-01-25

There are many ways, one of them is:

Go to ensembl biomart interface
Choose database --> Ensembl genes
Choose dataset --> homo sapiens genes
click filters
click genes
at ID list limit choose your preferred identifyer and paste a list of IDs where each ID is on a different line (I think you can copy paste from an Excel column)
click attributes
choose sequences
click +sign at sequences
choose unspliced gene
click results

Done ... :-)

score 0 · Answer 3 · 2013-01-25

0

Entering edit mode

11.2 years ago

giorgiocasaburi85 • 0

Thank you for your answer; unfortunately these genes are not human. They are from a bacteria (Neisseria Meningitidis). So the ensemble way I think it's not applicable.

ADD COMMENT • link 11.2 years ago by giorgiocasaburi85 • 0

0

Entering edit mode

Then again, there are many ways but you need to have the genome sequence of Neisseria Meningitidis and locations of the genes in that genome in order to extract the DNA sequence. I suggest you make a bed-file with the gene locations, download the Neisseria Meningitidis genome sequence and have a look at Batch Fetching Fasta Sequences From Bed File to extract DNA sequences based on your bed file.

ADD REPLY • link 11.2 years ago by Irsan ★ 7.8k

score 0 · Answer 4 · 2013-01-25

0

Entering edit mode

11.2 years ago

giorgiocasaburi85 • 0

Also I found this: http://biit.cs.ut.ee/gprofiler/gconvert.cgi ; It's very good for ID's conversion and it's not limited only to 2-3 genomes.

ADD COMMENT • link 11.2 years ago by giorgiocasaburi85 • 0

score 0 · Answer 5 · 2017-03-02

0

Entering edit mode

7.1 years ago

Maximilian Haeussler ★ 1.6k

Use the UCSC Table Browser http://genome.ucsc.edu/cgi-bin/hgTables

Select a genome and gene track

Under output, select sequence.

Click "get output". Select "genomic sequence". Select "CDS"

ADD COMMENT • link 7.1 years ago by Maximilian Haeussler ★ 1.6k