Question

extracting submatrix from microarray

0

Entering edit mode

6.6 years ago

mannoulag1 ▴ 120

I have a gene expression matrix , I would like to extract a submatrix of genes annotated by a GO term. I use the R code to have a list of genes annotated by this term:

mart <- biomaRt::useMart(biomart = "plants_mart", dataset = "athaliana_eg_gene", host = 'plants.ensembl.org')
GTOGO <- biomaRt::getBM(attributes = c( "ensembl_gene_id", "go_id"), mart = mart)
head (GTOGO)
geneList <- biomaRt::getBM(attributes = c( "ensembl_gene_id", "go_id"), filters = "go", values = "GO:......", mart = mart)

How to extract from my gene expression matrix (genes, conditions) the submatrix of these annotated genes, can I use merge as R function? Thank you

gene R thaliana microarrays GO • 2.0k views

ADD COMMENT • link updated 6.6 years ago by Kevin Blighe 87k • written 6.6 years ago by mannoulag1 ▴ 120

1

Entering edit mode

Unless I misunderstood OP, I think you should do the other way round. First subset genes of interest from the expression matrix, then do GO analysis and then make a final matrix (with expression values and GO terms).

ADD REPLY • link 6.6 years ago by cpad0112 21k

0

Entering edit mode

Hi, thank you. but how to subset genes of interest? my idea is to extract the genes list according to the GO term.

ADD REPLY • link 6.6 years ago by mannoulag1 ▴ 120

0

Entering edit mode

Please post a few lines (10) from your expression matrix. In general, genes of statistical significance (with lowest adjusted p-values) will be taken for analysis and many people also consider fold change for differential expression. To select for top represented GO terms, you may need to filter your list on some criteria that prevents noise.

ADD REPLY • link 6.6 years ago by cpad0112 21k

0

Entering edit mode

ok, my matrix is large, I poste here only 10 lignes (genes) and 4 conditions:

244901_at           3.823607         4.442251         4.284493         3.709218
244902_at           4.263668         4.519978         4.465266         4.434114
244903_at           9.989272        10.102369         9.956492         9.981083
244904_at           6.458268         6.407199         6.214501         6.290480
244906_at           5.961481         5.923713         5.591154         5.825557
244912_at          10.195152        10.272595        10.127589         9.715104
244920_s_at         8.941259         9.198353         9.049424         8.957291
244921_s_at         7.489384         7.889701         7.636804         7.565507
244926_s_at         3.515709         3.492847         3.492847         3.523809
244928_s_at         7.743774         7.855132         7.488903         7.568765

ADD REPLY • link 6.6 years ago by mannoulag1 ▴ 120

1

Entering edit mode

I have pasted an answer in my original answer (below).

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

1

Entering edit mode

They are not genes, they are Affymetrix probes. I guess you already extracted corresponding genes using either Affymetrix tools or BiomaRt (or any other tool). For probe definition look here: http://www.affymetrix.com/support/help/IVT_glossary/index.affx. If you have probes in a list, you can simply use bash solution or you can follow the R solution proposed below post by Kevin.

significant probes:

$ cat probes.txt 
244903_at
244904_at

Expression matrix:

$ cat test.txt 
244901_at           3.823607         4.442251         4.284493         3.709218
244902_at           4.263668         4.519978         4.465266         4.434114
244903_at           9.989272        10.102369         9.956492         9.981083
244904_at           6.458268         6.407199         6.214501         6.290480
244906_at           5.961481         5.923713         5.591154         5.825557
244912_at          10.195152        10.272595        10.127589         9.715104
244920_s_at         8.941259         9.198353         9.049424         8.957291
244921_s_at         7.489384         7.889701         7.636804         7.565507
244926_s_at         3.515709         3.492847         3.492847         3.523809
244928_s_at         7.743774         7.855132         7.488903         7.568765

command and output:

$ grep -f probes.txt test.txt 
244903_at           9.989272        10.102369         9.956492         9.981083
244904_at           6.458268         6.407199         6.214501         6.290480

ADD REPLY • link 6.6 years ago by cpad0112 21k

score 1 · Answer 1 · 2017-09-29

1

Entering edit mode

6.6 years ago

Kevin Blighe 87k

Hi mannoulag1, I think that this is what you want to do:

If you have an expression matrix (MyExpressionMatrix) with Ensembl gene IDs as rownames, then the following code will find the corresponding GO term for each gene (each gene is likely to have many associated GO terms):

geneList <- biomaRt::getBM(mart=mart, attributes=c("ensembl_gene_id", "go_id"), filter="ensembl_gene_id", values=rownames(MyExpressionMatrix), uniqueRows=TRUE)

If you then want to subset your expression matrix, it would be something like this:

MyExpressionMatrix[ which( rownames(MyExpressionMatrix) %in% geneList$ensembl_gene_id ) , ]

ADD COMMENT • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, Thank you very much, my expression matrix has 'array element names' (....._at) as row names, how to convert it to ensembl gene id? any idea please?

ADD REPLY • link 6.6 years ago by mannoulag1 ▴ 120

0

Entering edit mode

I see, I believe that those are Affymetrix IDs. You may want to take a look here: Affymetrix Gene Id Format - What Does "At" Stand For?

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

1

Entering edit mode

Thanks for pasting your data and providing information about the prove IDs. Here is the solution:

This will get the Ensembl ID, GO term, and GO term evidence code for all of your genes.

biomaRt::getBM(mart=mart, attributes=c("affy_ath1_121501", "ensembl_gene_id", "go_id", "go_linkage_type"), filter="affy_ath1_121501", values=rownames(MyExpressionMatrix), uniqueRows=TRUE)

You can then filter your genes based on the GO terms. Pay close attention to the evidence codes, as these will help you to decide which GO term is more reliable. Take a further look here to understand these: Go annotation reliability ? also here: http://www.geneontology.org/page/guide-go-evidence-codes

You may also wish to consider adding external_gene_name to your output. A full list of possible values can be obtained with listAttributes(mart)

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

0

Entering edit mode

thank you very much, with the following code I get a submatrix but with probe-ID, this can seem true?

geneList<- biomaRt::getBM(attributes = c( "ensembl_gene_id", "go_id", "affy_ath1_121501"), filters = "go", values = "GO:.....", mart = mart)
subMatrix<- MyExpressionMatrix[which(rownames(MyExpressionMatrix) %in% geneList$affy_ath1_121501 ) , ]

ADD REPLY • link 6.6 years ago by mannoulag1 ▴ 120

1

Entering edit mode

Hello,

If you want to replace your '_at' IDs to Ensembl IDs, then use this code:

mart <- biomaRt::useMart(biomart = "plants_mart", dataset = "athaliana_eg_gene", host = 'plants.ensembl.org')
geneList <- biomaRt::getBM(mart=mart, attributes=c("affy_ath1_121501", "ensembl_gene_id"), filter="affy_ath1_121501", values=rownames(MyExpressionMatrix), uniqueRows=TRUE)

rownames(MyExpressionMatrix)
 [1] "244901_at"   "244902_at"   "244903_at"   "244904_at"   "244906_at"  
 [6] "244912_at"   "244920_s_at" "244921_s_at" "244926_s_at" "244928_s_at"
rownames(MyExpressionMatrix) <- geneList[match(rownames(MyExpressionMatrix), geneList$affy_ath1_121501), "ensembl_gene_id"]
rownames(MyExpressionMatrix)
 [1] "ATMG00640" "ATMG00650" "ATMG00660" "ATMG00670" "ATMG00690" "ATMG00830"
 [7] "ATMG00990" "ATMG01000" "ATMG00520" "ATMG00570"

The use of the GO terms is complicated by the fact that each gene has multiple GO terms assigned to it.

Does this help?

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k

1

Entering edit mode

Hi Kevin, yes , thank you very much

ADD REPLY • link 6.6 years ago by mannoulag1 ▴ 120

1

Entering edit mode

Let me know if you still need help with the GO terms.

ADD REPLY • link 6.6 years ago by Kevin Blighe 87k