Question: Create a database of RefSeq genes
0
gravatar for smrutimayipanda
6 weeks ago by
smrutimayipanda10 wrote:

Hii all, I am working on microarray data analysis pipeline. I want protein coding genes, i.e all RefSeq genes of Human.to create a database. From where I can get the genes or is there any way to create a database of these genes? Please help me.

microarray R • 175 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by smrutimayipanda10

I believe you should be able to find what you want here: https://www.ncbi.nlm.nih.gov/genome/guide/human/

You can download the file you need for either hg38 (GRCh38) or hg19 (GRCh37) human genome assemblies

ADD REPLYlink written 6 weeks ago by helzerk40

That won't help. OP wants all protein coding sequences. Really cool resource though - I did not know that web page existed.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by RamRS30k

but i need genes, not sequences. how to find that?

ADD REPLYlink written 6 weeks ago by smrutimayipanda10

Thank you all, the third one is working well husensofteng

ADD REPLYlink written 6 weeks ago by smrutimayipanda10

smrutimayipanda : Please use ADD REPLY/ADD COMMENT when responding to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.


If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink written 6 weeks ago by genomax89k

smrutimayipanda, you have multiple questions that you have not accepted answers for. Please also go back and provide feedback on them.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by RamRS30k
1
gravatar for husensofteng
6 weeks ago by
husensofteng270
Sweden
husensofteng270 wrote:

The easiest way to get symbol and information on protein coding genes is through the NCBI gene resource page, here is a link for refseq protein-coding genes that you can re-generate with this query:

"Homo sapiens"[Organism] AND ("genetype protein coding"[Properties] AND "srcdb refseq"[Properties] AND alive[prop])

to download the output table as a file, just click Send to: at the top of the page and select File.

ADD COMMENTlink written 6 weeks ago by husensofteng270
0
gravatar for vkkodali
6 weeks ago by
vkkodali2.1k
United States
vkkodali2.1k wrote:

If you are trying to download the sequences of all protein-coding transcripts then go to this page and use the 'Download Assembly' button, choose 'RefSeq' as source and download 'RNA FASTA (.fna)' file. This has both non-coding and coding transcript sequences. You can then use seqkit to extract all protein coding transcripts as follows:

seqkit grep -r -p '[NX]M_\d+\.\d+' GCF_000001405.39_GRCh38.p13_rna.fna.gz -o protein_coding_tx.fna
ADD COMMENTlink written 6 weeks ago by vkkodali2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1154 users visited in the last hour