Question: Is There A Single Database Where I Can Find All The Human Gene Accession And Sequeunce?
1
gravatar for Firoz
8.5 years ago by
Firoz10
Firoz10 wrote:

I am looking for All Human gene sequence, specifically coding region. Is there any database from where I can download all human gene sequences? I know i can go to NCBI or EMBL and search for individual gene/s but what i needed is a single flat file which contains all refseq gene of Human OR single query which can retrieve all human gene from any database.

gene human • 2.5k views
ADD COMMENTlink written 8.5 years ago by Firoz10

Thanks a lot guys. Found both NCBI and Ensembl. But dont quite understand why they name different number of refSeq in these two database. NCBI gives you 4423 sequences, whereas the Ensembl more than 19000. Also found another archive http://www.genenames.org/cgi-bin/hgnc_stats.pl which I gues is the repository of Annoted Gene. That number is consistent with Ensembl. Any suggestion which one to use?

ADD REPLYlink modified 5 months ago by RamRS26k • written 8.5 years ago by Firoz10
3
gravatar for User 3869
8.5 years ago by
User 3869100
User 3869100 wrote:

You can use NCBI FTP for RefSeq. The fasta file of all genes is available.

ADD COMMENTlink modified 8.5 years ago by Michael Kuhn5.0k • written 8.5 years ago by User 3869100

Thanks a lot guys. Found both NCBI and Ensembl. But dont quite understand why they name different number of refSeq in these two database. NCBI gives you 4423 sequences, whereas the Ensembl more than 19000. Also found another archive genenames.org/cgi-bin/hgnc_stats.pl which I gues is the repository of Annoted Gene. That number is consistent with Ensembl. Any suggestion which one to use?

ADD REPLYlink written 8.5 years ago by Firoz10

There are several big institutes, e.g., Ensembl, NCBI, and UCSC, maintaining their own gene annotations. They all have pros and cons.

The RefSeq gene annotation is proposed and maintained by NCBI. If you would like to use refseg genes as you mentioned in your question, you should use the NCBI FTP. If you want more comprehensive (and noisier) annotation, try Ensembl.

ADD REPLYlink written 8.5 years ago by User 3869100
3
gravatar for Bert Overduin
8.5 years ago by
Bert Overduin3.7k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.7k wrote:

See the Ensembl FTP site.

ADD COMMENTlink written 8.5 years ago by Bert Overduin3.7k

Thanks a lot guys. Found both NCBI and Ensembl. But dont quite understand why they name different number of refSeq in these two database. NCBI gives you 4423 sequences, whereas the Ensembl more than 19000. Also found another archive genenames.org/cgi-bin/hgnc_stats.pl which I gues is the repository of Annoted Gene. That number is consistent with Ensembl. Any suggestion which one to use?

ADD REPLYlink written 8.5 years ago by Firoz10

I don't know where you get the number of 4423, but that number simply cannot be right. I am pretty sure that there are RefSeqs for the majority of human protein-coding genes, so I would at least expect a number around the 20,000.

ADD REPLYlink written 8.5 years ago by Bert Overduin3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1101 users visited in the last hour