Create a database of RefSeq genes
2
0
Entering edit mode
3.7 years ago

Hii all, I am working on microarray data analysis pipeline. I want protein coding genes, i.e all RefSeq genes of Human.to create a database. From where I can get the genes or is there any way to create a database of these genes? Please help me.

R microarray • 1.0k views
ADD COMMENT
0
Entering edit mode

I believe you should be able to find what you want here: https://www.ncbi.nlm.nih.gov/genome/guide/human/

You can download the file you need for either hg38 (GRCh38) or hg19 (GRCh37) human genome assemblies

ADD REPLY
0
Entering edit mode

That won't help. OP wants all protein coding sequences. Really cool resource though - I did not know that web page existed.

ADD REPLY
0
Entering edit mode

but i need genes, not sequences. how to find that?

ADD REPLY
0
Entering edit mode

Thank you all, the third one is working well husensofteng

ADD REPLY
0
Entering edit mode

smrutimayipanda : Please use ADD REPLY/ADD COMMENT when responding to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.


If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

smrutimayipanda, you have multiple questions that you have not accepted answers for. Please also go back and provide feedback on them.

ADD REPLY
1
Entering edit mode
3.7 years ago
husensofteng ▴ 410

The easiest way to get symbol and information on protein coding genes is through the NCBI gene resource page, here is a link for refseq protein-coding genes that you can re-generate with this query:

"Homo sapiens"[Organism] AND ("genetype protein coding"[Properties] AND "srcdb refseq"[Properties] AND alive[prop])

to download the output table as a file, just click Send to: at the top of the page and select File.

ADD COMMENT
0
Entering edit mode
3.7 years ago
vkkodali_ncbi ★ 3.7k

If you are trying to download the sequences of all protein-coding transcripts then go to this page and use the 'Download Assembly' button, choose 'RefSeq' as source and download 'RNA FASTA (.fna)' file. This has both non-coding and coding transcript sequences. You can then use seqkit to extract all protein coding transcripts as follows:

seqkit grep -r -p '[NX]M_\d+\.\d+' GCF_000001405.39_GRCh38.p13_rna.fna.gz -o protein_coding_tx.fna
ADD COMMENT

Login before adding your answer.

Traffic: 3469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6