Dataset kind of data NCBI
Entering edit mode
5.4 years ago
lvitale • 0

I'm a Computer Science student and I'd like to make an application in bioinformatics. I'm looking for dataset with gene expression and I found two interesting dataset on NCBI. But I don't understand what kind of annotation is used.

The first one is:

I read that the experiment is composed by 12,023 genes. But I don't understand which annotation is used. The first "genes" are: "10000_at" "10001_at" "10002_at" "10003_at" "10004_at" "10005_at" "10006_at" "10007_at" "10009_at" "1000_at". My question. is this annotation geneID? But why there is at at the end? How I can transform this kind of annotation in gene symbol?

The second one is:

In this case the annotation is like the first dataset: "1007_s_at" "1053_at" "117_at" "121_at" "1255_g_at" "1294_at" "1316_at" "1320_at"
but the number of "genes" available is 54675 but I knew that the number of human protein-coding genes estimated was 19,000-20,000. Can I transform this kind of data in gene symbol?

Thank you so much

disease dataset • 1.4k views
Entering edit mode
5.4 years ago
bharata1803 ▴ 560

The annotation come from the microarray platform. You should check the platform information first every time you download public data because it may come from different platforms. I believe that there is an R library which can decode the gene annotation for each microarray platform but you can also download the table manually from the platform information from NCBI website.

For the first dataset, the platform is this:

For the second dataset:

Entering edit mode

Thank you so much! :)


Login before adding your answer.

Traffic: 3404 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6