Question: gene list for RNA seq
0
gravatar for pshubhamoy
4.0 years ago by
pshubhamoy20
pshubhamoy20 wrote:

Hi, I need a list of all the protein coding genes from mouse with gene name and corresponding Entrez Gene ID. can somebody help.

thanks

rna-seq • 1.2k views
ADD COMMENTlink modified 4.0 years ago by ebrahimiet40 • written 4.0 years ago by pshubhamoy20
1
gravatar for EagleEye
4.0 years ago by
EagleEye6.6k
Sweden
EagleEye6.6k wrote:

1) simple way:

Example, Mus musculus

http://rest.kegg.jp/list/mmu

For other organism/species use codes from following link,

http://rest.kegg.jp/list/organism

2) With some effort using UCSC (Select Mus musculus as reference in your case):

A: I need to download a list of all human genes with their respective Esemble gene

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by EagleEye6.6k

This is a dumb question but I just wanted to double check, this is independent of reference genome (mm9, mm10), correct?

ADD REPLYlink written 4.0 years ago by steve2.6k

Hi Steve,

good point, yes I am using mm10

ADD REPLYlink written 4.0 years ago by pshubhamoy20
1
gravatar for unksci
4.0 years ago by
unksci160
unksci160 wrote:

the most up-to-date and comprehensive reference for Entrez Gene ID, which also contains the status of the genes, is:

http://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz (which you have to filter by taxon ID 10090 for Mus muscululs)

or, conveniently : http://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Mus_musculus.gene_info.gz

The table also contains the gene name (or more correctly: gene names separated by naming institution), and the gene-type (which you have to filter for "protein-coding")

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by unksci160
0
gravatar for pshubhamoy
4.0 years ago by
pshubhamoy20
pshubhamoy20 wrote:

Thanks all,

actually I have a RNA-seq data set and I would like to remove all the non-coding genes. is there any available tool for that?

ADD COMMENTlink written 4.0 years ago by pshubhamoy20

This is something that you could accomplish fairly easily in any programming language e.g. R or Python. However the specific implementation would be heavily dependent on the structure of your data

ADD REPLYlink written 4.0 years ago by steve2.6k

Strategy for R:

Assuming you have a count table with genes on rows and samples on columns you can easily read this table in as a data.frame counts. Analogous you can read in the list of genes you want to keep (or remove, doesn't matter) into a vector keepgenes.

Then you can easily slice the counts data.frame by something like newdata <- counts[counts$gene %in% keepgenes, ]

If this strategy/pseudocode is unclear for you I can expand on this, but this is really basic R and you'll make your RNA-seq analysis far less painful if you dive in a R tutorial and learn yourself how to perform these tasks. Knowledge of at least one programming language is an enormous advantage.

ADD REPLYlink written 4.0 years ago by WouterDeCoster44k
0
gravatar for ebrahimiet
4.0 years ago by
ebrahimiet40
ebrahimiet40 wrote:

Hi

You can easily use Ensembl Biomart

http://www.ensembl.org/biomart/martview/3e6bde1e77a85c663750a6367619f66f Regards

Esmaeil

ADD COMMENTlink written 4.0 years ago by ebrahimiet40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1284 users visited in the last hour