How To Retrieve Fasta Sequences By Gene Symbols In Multiple Query?
6
0
Entering edit mode
11.2 years ago

Hi all,

I have a set of 500 orf:

NMB0001 NMB0002 NMB0010 ... ..

I have to obtain the corresponding fasta nucleotide sequences of those orf. Doing it manually wouold be too much time and energy spending. Is there any fast and automatic way to obtain the fasta sequences given the Symbol as ID's?

Thanks a lot in advance for your help

Giorgio

data • 12k views
ADD COMMENT
1
Entering edit mode

programmatically (e.g. BioPerl/Biomart) or with a user interface (e.g. UCSC table browser/Biomart)?

ADD REPLY
0
Entering edit mode

I know it could be done programmatically..is for that reason I wrote the post since I don't know how to do it! :/

ADD REPLY
8
Entering edit mode
11.2 years ago

This can be done in two steps:

  1. At the NCBI, see the complete chromosome sequence of Neisseria meningitidis: http://www.ncbi.nlm.nih.gov/nuccore/NC_003112.2
  2. Send -> File -> Coding Sequences

You will get the multiple FASTA file containing coding sequences of all Neisseria genes (NB0001, NB0002, etc.). See below:

>lcl|NC_003112.2_cdsid_NP_273067.1 [gene=NMB0001] [protein=acetyltransferase] [protein_id=NP_273067.1] [location=complement(7..498)]
ATGAATTCCCTCTTTGTGGACAATACTGTTTTCATTACACGGCTGAAAGCCGGGCATATCGGCAGGTTGG
TTCAGGCGTTGTTTGAGGAGTGGCACGGATTTGAACCGTGGTCTTCTGTGGATAAGATTCATGCCTATTA
CGGCAGGTGTTTGAAGGATGACGAACTGCCGCTGGCATTTGCGGCTGTGGATGATTCCGGAATCCTGTTG
GGTTCGGCTGCGGTCAAGCGGCATGATATGGAAAGTTTTCCACGGTATGAATATTGGTTGGGGGATGTCT
TTGTTTTACCTGAATATCGCGGAAAAGGCATTGGCAGGAGGCTGGTCGCCCACTGCATAGGCGCAGCGCG
TTCGCTGGGGATAAAGTTCTTGTATCTTTATACGCCTGATGTGCAAATATTTTATGAATCATTCGGCTGG
GTGGTTGTCGGGCGACATTTCCATAACGGTGAATGGGTTACGGTTATGCGTTTGGATGTGGATAAGGTTT
AA
>lcl|NC_003112.2_cdsid_NP_273068.1 [gene=NMB0002] [protein=hypothetical protein] [protein_id=NP_273068.1] [location=complement(502..897)]
ATGCCGTCTGATGTCGGAATACGGCTTCAGACAGCATTTAAATGGAAGTTAAAAATGAAAAAAATATTTT
ATTTTCTGATGGTTGTTTTTTCTACAAGCGTATGGGCAGGGGATGCTGAAGACAATCTGCTCAGCATCCA
ATCCGGTTACCGCGCCTTATTGCAAAAGCAAAACAATCTGGACGGAAAAATCATCGGGATGCAGTCGGAT
TTGGAAGATGCGCGCCGGCGTTTGCAGACGGCTCAGGCAGACATCGCCCGTTTGGAAGCGGAAATTCCTG
CAGCAATGGCGCAAAAAGCCCGGCAGGCTGAAGATTTGAGGCAAATCGGAGTGCGTTTGGACCATGCTTG
GAATGCGGTTTACGGCGCAGGGGGAACGAAGGCGTCGGGGAATTGA
[...]
ADD COMMENT
4
Entering edit mode
11.2 years ago
Irsan ★ 7.8k

There are many ways, one of them is:

  • Go to ensembl biomart interface
  • Choose database --> Ensembl genes
  • Choose dataset --> homo sapiens genes
  • click filters
  • click genes
  • at ID list limit choose your preferred identifyer and paste a list of IDs where each ID is on a different line (I think you can copy paste from an Excel column)
  • click attributes
  • choose sequences
  • click +sign at sequences
  • choose unspliced gene
  • click results

Done ... :-)

ADD COMMENT
0
Entering edit mode
11.2 years ago

Thank you for your answer; unfortunately these genes are not human. They are from a bacteria (Neisseria Meningitidis). So the ensemble way I think it's not applicable.

ADD COMMENT
0
Entering edit mode

Then again, there are many ways but you need to have the genome sequence of Neisseria Meningitidis and locations of the genes in that genome in order to extract the DNA sequence. I suggest you make a bed-file with the gene locations, download the Neisseria Meningitidis genome sequence and have a look at Batch Fetching Fasta Sequences From Bed File to extract DNA sequences based on your bed file.

ADD REPLY
0
Entering edit mode
11.2 years ago

Also I found this: http://biit.cs.ut.ee/gprofiler/gconvert.cgi ; It's very good for ID's conversion and it's not limited only to 2-3 genomes.

ADD COMMENT
0
Entering edit mode
7.1 years ago

Use the UCSC Table Browser http://genome.ucsc.edu/cgi-bin/hgTables

Select a genome and gene track

Under output, select sequence.

Click "get output". Select "genomic sequence". Select "CDS"

ADD COMMENT

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6