Question: extracting submatrix from microarray
0
gravatar for mannoulag1
19 months ago by
mannoulag160
mannoulag160 wrote:

I have a gene expression matrix , I would like to extract a submatrix of genes annotated by a GO term. I use the R code to have a list of genes annotated by this term:

mart <- biomaRt::useMart(biomart = "plants_mart", dataset = "athaliana_eg_gene", host = 'plants.ensembl.org')
GTOGO <- biomaRt::getBM(attributes = c( "ensembl_gene_id", "go_id"), mart = mart)
head (GTOGO)
geneList <- biomaRt::getBM(attributes = c( "ensembl_gene_id", "go_id"), filters = "go", values = "GO:......", mart = mart)

How to extract from my gene expression matrix (genes, conditions) the submatrix of these annotated genes, can I use merge as R function? Thank you

microarrays go thaliana R gene • 774 views
ADD COMMENTlink modified 19 months ago by Kevin Blighe41k • written 19 months ago by mannoulag160
1

Unless I misunderstood OP, I think you should do the other way round. First subset genes of interest from the expression matrix, then do GO analysis and then make a final matrix (with expression values and GO terms).

ADD REPLYlink written 19 months ago by cpad011211k

Hi, thank you. but how to subset genes of interest? my idea is to extract the genes list according to the GO term.

ADD REPLYlink written 19 months ago by mannoulag160

Please post a few lines (10) from your expression matrix. In general, genes of statistical significance (with lowest adjusted p-values) will be taken for analysis and many people also consider fold change for differential expression. To select for top represented GO terms, you may need to filter your list on some criteria that prevents noise.

ADD REPLYlink modified 19 months ago • written 19 months ago by cpad011211k

ok, my matrix is large, I poste here only 10 lignes (genes) and 4 conditions:

244901_at           3.823607         4.442251         4.284493         3.709218
244902_at           4.263668         4.519978         4.465266         4.434114
244903_at           9.989272        10.102369         9.956492         9.981083
244904_at           6.458268         6.407199         6.214501         6.290480
244906_at           5.961481         5.923713         5.591154         5.825557
244912_at          10.195152        10.272595        10.127589         9.715104
244920_s_at         8.941259         9.198353         9.049424         8.957291
244921_s_at         7.489384         7.889701         7.636804         7.565507
244926_s_at         3.515709         3.492847         3.492847         3.523809
244928_s_at         7.743774         7.855132         7.488903         7.568765
ADD REPLYlink written 19 months ago by mannoulag160
1

I have pasted an answer in my original answer (below).

ADD REPLYlink written 19 months ago by Kevin Blighe41k
1

They are not genes, they are Affymetrix probes. I guess you already extracted corresponding genes using either Affymetrix tools or BiomaRt (or any other tool). For probe definition look here: http://www.affymetrix.com/support/help/IVT_glossary/index.affx. If you have probes in a list, you can simply use bash solution or you can follow the R solution proposed below post by Kevin.

significant probes:

$ cat probes.txt 
244903_at
244904_at

Expression matrix:

$ cat test.txt 
244901_at           3.823607         4.442251         4.284493         3.709218
244902_at           4.263668         4.519978         4.465266         4.434114
244903_at           9.989272        10.102369         9.956492         9.981083
244904_at           6.458268         6.407199         6.214501         6.290480
244906_at           5.961481         5.923713         5.591154         5.825557
244912_at          10.195152        10.272595        10.127589         9.715104
244920_s_at         8.941259         9.198353         9.049424         8.957291
244921_s_at         7.489384         7.889701         7.636804         7.565507
244926_s_at         3.515709         3.492847         3.492847         3.523809
244928_s_at         7.743774         7.855132         7.488903         7.568765

command and output:

$ grep -f probes.txt test.txt 
244903_at           9.989272        10.102369         9.956492         9.981083
244904_at           6.458268         6.407199         6.214501         6.290480
ADD REPLYlink written 19 months ago by cpad011211k
1
gravatar for Kevin Blighe
19 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

Hi mannoulag1, I think that this is what you want to do:

If you have an expression matrix (MyExpressionMatrix) with Ensembl gene IDs as rownames, then the following code will find the corresponding GO term for each gene (each gene is likely to have many associated GO terms):

geneList <- biomaRt::getBM(mart=mart, attributes=c("ensembl_gene_id", "go_id"), filter="ensembl_gene_id", values=rownames(MyExpressionMatrix), uniqueRows=TRUE)

If you then want to subset your expression matrix, it would be something like this:

MyExpressionMatrix[ which( rownames(MyExpressionMatrix) %in% geneList$ensembl_gene_id ) , ]
ADD COMMENTlink modified 19 months ago • written 19 months ago by Kevin Blighe41k

Hi Kevin, Thank you very much, my expression matrix has 'array element names' (....._at) as row names, how to convert it to ensembl gene id? any idea please?

ADD REPLYlink modified 19 months ago • written 19 months ago by mannoulag160

I see, I believe that those are Affymetrix IDs. You may want to take a look here: Affymetrix Gene Id Format - What Does "At" Stand For?

ADD REPLYlink modified 19 months ago • written 19 months ago by Kevin Blighe41k
1

Thanks for pasting your data and providing information about the prove IDs. Here is the solution:

This will get the Ensembl ID, GO term, and GO term evidence code for all of your genes.

biomaRt::getBM(mart=mart, attributes=c("affy_ath1_121501", "ensembl_gene_id", "go_id", "go_linkage_type"), filter="affy_ath1_121501", values=rownames(MyExpressionMatrix), uniqueRows=TRUE)

You can then filter your genes based on the GO terms. Pay close attention to the evidence codes, as these will help you to decide which GO term is more reliable. Take a further look here to understand these: Go annotation reliability ? also here: http://www.geneontology.org/page/guide-go-evidence-codes

You may also wish to consider adding external_gene_name to your output. A full list of possible values can be obtained with listAttributes(mart)

ADD REPLYlink modified 19 months ago • written 19 months ago by Kevin Blighe41k

thank you very much, with the following code I get a submatrix but with probe-ID, this can seem true?

geneList<- biomaRt::getBM(attributes = c( "ensembl_gene_id", "go_id", "affy_ath1_121501"), filters = "go", values = "GO:.....", mart = mart)
subMatrix<- MyExpressionMatrix[which(rownames(MyExpressionMatrix) %in% geneList$affy_ath1_121501 ) , ]
ADD REPLYlink modified 19 months ago • written 19 months ago by mannoulag160
1

Hello,

If you want to replace your '_at' IDs to Ensembl IDs, then use this code:

mart <- biomaRt::useMart(biomart = "plants_mart", dataset = "athaliana_eg_gene", host = 'plants.ensembl.org')
geneList <- biomaRt::getBM(mart=mart, attributes=c("affy_ath1_121501", "ensembl_gene_id"), filter="affy_ath1_121501", values=rownames(MyExpressionMatrix), uniqueRows=TRUE)

rownames(MyExpressionMatrix)
 [1] "244901_at"   "244902_at"   "244903_at"   "244904_at"   "244906_at"  
 [6] "244912_at"   "244920_s_at" "244921_s_at" "244926_s_at" "244928_s_at"
rownames(MyExpressionMatrix) <- geneList[match(rownames(MyExpressionMatrix), geneList$affy_ath1_121501), "ensembl_gene_id"]
rownames(MyExpressionMatrix)
 [1] "ATMG00640" "ATMG00650" "ATMG00660" "ATMG00670" "ATMG00690" "ATMG00830"
 [7] "ATMG00990" "ATMG01000" "ATMG00520" "ATMG00570"

The use of the GO terms is complicated by the fact that each gene has multiple GO terms assigned to it.

Does this help?

ADD REPLYlink written 19 months ago by Kevin Blighe41k
1

Hi Kevin, yes , thank you very much

ADD REPLYlink written 19 months ago by mannoulag160
1

Let me know if you still need help with the GO terms.

ADD REPLYlink written 19 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour