Question

Affymetrix miRNA 4.0 Normalization

0

Entering edit mode

5.1 years ago

gracie ▴ 20

I am analyzing a GeneChip miRNA 4.0 Array dataset. here is my code;

library(oligo)
library(pd.mirna.4.0)
celFiles <- list.celfiles( full.names=TRUE)
rawData <- read.celfiles(celFiles, pkgname="pd.mirna.4.0")
eset <- rma(rawData)

This gives results for all organism. Is there any way to do the analysis only for human miRNAs?

I also tried to do with affymetrix transcription analysis software only for human (RMA + DABG) but at the very end it gives me an error;

Analysis Failed: An error occurred while reading limma-output.h5: The type INTEGER for the dataSetName group_expressed_10 is not supported.

I could not solve it as well.

miRNA microarray • 2.6k views

ADD COMMENT • link 5.1 years ago by gracie ▴ 20

0

Entering edit mode

Thank you, I will go with that most probably.

ADD REPLY • link 5.1 years ago by gracie ▴ 20

score 1 · Accepted Answer · 2019-03-29

1

Entering edit mode

5.1 years ago

Kevin Blighe 87k

Edit: scroll down for answer

----------------------------

Can you provide the full name of the microarray that you used? - it should only target one species. So, what do you mean by this:

Is there any way to do the analysis only for human miRNAs?

ADD COMMENT • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Sure, GeneChip miRNA 4.0 it is. There are probes for human mouse and rat. Since using all probes would change RMA and limma results I thought I should use only human probes.

ADD REPLY • link 5.1 years ago by gracie ▴ 20

0

Entering edit mode

Hey, thanks for the link. In fact, it has micro RNAs (miRNAs) for all of these:

awk '!/^#/{print}' miRNA-4_0-st-v1.annotations.20160922.csv | cut -f6 -d, | sort | uniq -c
    131 "---"
      7 "Acacia auriculiformis"
      3 "Acacia mangium"
    103 "Acyrthosiphon pisum"
    124 "Aedes aegypti"
      2 "Aegilops tauschii"
     16 "Amphimedon queenslandica"
    416 "Anolis carolinensis"
     65 "Anopheles gambiae"
    222 "Apis mellifera"
     45 "Aquilegia caerulea"
    384 "Arabidopsis lyrata"
    337 "Arabidopsis thaliana"
     32 "Arachis hypogaea"
     19 "Artibeus jamaicensis"
    189 "Ascaris suum"
     54 "Ateles geoffroyi"
      3 "Avicennia marina"
      1 "Bandicoot papillomatosis carcinomatosis virus type 1"
      1 "Bandicoot papillomatosis carcinomatosis virus type 2"
      2 "BK polyomavirus"
    567 "Bombyx mori"
    783 "Bos taurus"
     12 "Bovine herpesvirus 1"
     10 "Bovine leukemia virus"
    464 "Brachypodium distachyon"
    173 "Branchiostoma belcheri"
    187 "Branchiostoma floridae"
     92 "Brassica napus"
      7 "Brassica oleracea"
     43 "Brassica rapa"
    108 "Brugia malayi"
      4 "Bruguiera cylindrica"
      4 "Bruguiera gymnorhiza"
    152 "Caenorhabditis brenneri"
    165 "Caenorhabditis briggsae"
    368 "Caenorhabditis elegans"
    182 "Caenorhabditis remanei"
    291 "Canis familiaris"
    134 "Capitella teleta"
     81 "Carica papaya"
      2 "Cerebratulus lacteus"
     85 "Chlamydomonas reinhardtii"
    550 "Ciona intestinalis"
     25 "Ciona savignyi"
      5 "Citrus clementine"
      4 "Citrus reticulata"
     64 "Citrus sinensis"
      6 "Citrus trifoliata"
    307 "Cricetulus griseus"
    120 "Cucumis melo"
     93 "Culex quinquefasciatus"
      5 "Cunninghamia lanceolata"
     57 "Cynara cardunculus"
    146 "Cyprinus carpio"
    255 "Danio rerio"
     45 "Daphnia pulex"
     20 "Dictyostelium discoideum"
     13 "Digitalis purpurea"
     75 "Drosophila ananassae"
     78 "Drosophila erecta"
     72 "Drosophila grimshawi"
    426 "Drosophila melanogaster"
     71 "Drosophila mojavensis"
     69 "Drosophila persimilis"
    273 "Drosophila pseudoobscura"
     76 "Drosophila sechellia"
    178 "Drosophila simulans"
     74 "Drosophila virilis"
     72 "Drosophila willistoni"
     75 "Drosophila yakuba"
     33 "Duck enteritis virus"
     26 "Echinococcus granulosus"
     22 "Echinococcus multilocularis"
     52 "Ectocarpus siliculosus"
      6 "Elaeis guineensis"
     44 "Epstein Barr virus"
    360 "Equus caballus"
     15 "Festuca arundinacea"
    108 "Fugu rubripes"
    996 "Gallus gallus"
      1 "Glottidia pyramidata"
    554 "Glycine max"
     13 "Glycine soja"
    317 "Gorilla gorilla"
      1 "Gossypium arboreum"
      1 "Gossypium herbaceum"
     80 "Gossypium hirsutum"
      4 "Gossypium raimondii"
    194 "Haemonchus contortus"
      5 "Haliotis rufescens"
      8 "Helianthus annuus"
      3 "Helianthus argophyllus"
      3 "Helianthus ciliaris"
      2 "Helianthus exilis"
      3 "Helianthus paradoxus"
      3 "Helianthus petiolaris"
     16 "Helianthus tuberosus"
     97 "Heliconius melpomene"
     15 "Herpes B virus"
     27 "Herpes Simplex Virus 1"
     24 "Herpes Simplex Virus 2"
     28 "Herpesvirus of turkeys"
      6 "Herpesvirus saimiri strain A11"
     28 "Hevea brasiliensis"
     37 "Hippoglossus hippoglossus"
   6631 "Homo sapiens"
    ...

I got this from the file labeled 'Current NetAffx Annotation Files: MiRNA-4_0 Annotations, CSV format' on the page to which you linked.

I find it odd that Affymetrix / Thermofisher would bundle all of these miRNAs on the same chip. You can use the NetAffx file to obtain the human only miRNAs. They are in column 4, but you may also need column 1 to match to you CEL files:

awk '!/^#/{print}' miRNA-4_0-st-v1.annotations.20160922.csv | cut -f1,4,6 -d, | grep -e "Homo sapiens" | head -10
"20500112","hsa-let-7a-5p","Homo sapiens"
"20500113","hsa-let-7a-3p","Homo sapiens"
"20500114","hsa-let-7a-2-3p","Homo sapiens"
"20500115","hsa-let-7b-5p","Homo sapiens"
"20500116","hsa-let-7b-3p","Homo sapiens"
"20500117","hsa-let-7c-5p","Homo sapiens"
"20500118","hsa-let-7c-3p","Homo sapiens"
"20500119","hsa-let-7d-5p","Homo sapiens"
"20500120","hsa-let-7d-3p","Homo sapiens"
"20500121","hsa-let-7e-5p","Homo sapiens"

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you for your answer but I could not find a way to filter rawdata with probeIDs. When I extract the row names with probeNames function (probeNames(rawData), it has 346085 probes however array expression data (rawData@assayData$exprs) has 292681 rows.

ADD REPLY • link 5.1 years ago by gracie ▴ 20

0

Entering edit mode

Yes, because multiple probes will be summarised into probe-sets during normalisation. That is, multiple probes will target the same, for example, exon of a target gene.

A further summarisation is given by the target parameter that is passed to rma() - take a look at my previous answer, here: C: Human Exon array probeset to gene-level expression

What are the rownames of both the raw and then the normalised data?

rownames(eset) and rownames(rawData) should access the row names.

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you. The reason I want to extract human only probes is I thought rma results would differ between rma with all probesets vs only human.

    > rownames(rawData)
   [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"   "19"   "20"   "21"   "22"   "23"  
  [24] "24"   "25"   "26"   "27"   "28"   "29"   "30"   "31"   "32"   "33"   "34"   "35"   "36"   "37"   "38"   "39"   "40"   "41"   "42"   "43"   "44"   "45"   "46"  
  [47] "47"   "48"   "49"   "50"   "51"   "52"   "53"   "54"   "55"   "56"   "57"   "58"   "59"   "60"   "61"   "62"   "63"   "64"   "65"   "66"   "67"   "68"   "69"  
  [70] "70"   "71"   "72"   "73"   "74"   "75"   "76"   "77"   "78"   "79"   "80"   "81"   "82"   "83"   "84"   "85"   "86"   "87"   "88"   "89"   "90"   "91"   "92"  
  [93] "93"   "94"   "95"   "96"   "97"   "98"   "99"   "100"  "101"  "102"  "103"  "104"  "105"  "106"  "107"  "108"  "109"  "110"  "111"  "112"  "113"  "114"  "115" 



 > rownames(eset)
       [1] "14q0_st"                 "14qI-1_st"               "14qI-1_x_st"             "14qI-2_st"               "14qI-3_x_st"             "14qI-4_st"              
       [7] "14qI-4_x_st"             "14qI-5_st"               "14qI-6_st"               "14qI-7_st"               "14qI-8_st"               "14qI-8_x_st"            
      [13] "14qI-9_x_st"             "14qII-1_st"              "14qII-1_x_st"            "14qII-10_st"             "14qII-11_st"             "14qII-12_st"            
      [19] "14qII-12_x_st"           "14qII-13_st"             "14qII-14_st"             "14qII-14_x_st"           "14qII-15_x_st"           "14qII-16_st"

ADD REPLY • link 5.1 years ago by gracie ▴ 20

0

Entering edit mode

if you look at the output of str(rawData), can you see any variable that may contain the probe IDs?

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

As much as I know it should be in featureData part but it is empty it only says annotated dataframe.

> str(rawData)
Formal class 'ExpressionFeatureSet' [package "oligoClasses"] with 9 slots
  ..@ manufacturer     : chr "Affymetrix"
  ..@ intensityFile    : chr NA
  ..@ assayData        :<environment: 0x000000003016cf68> 
  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 1 obs. of  2 variables:
  .. .. .. ..$ labelDescription: chr "Index"
  .. .. .. ..$ channel         : Factor w/ 2 levels "exprs","_ALL_": 2
  .. .. ..@ data             :'data.frame': 4 obs. of  1 variable:
  .. .. .. ..$ index: int [1:4] 1 2 3 4
  .. .. ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 292681 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0

ADD REPLY • link 5.1 years ago by gracie ▴ 20

0

Entering edit mode

I have done this previously but using a probe annotation file that was available at the Affymetrix / Thermofisher website - there does not appear to be such a file available for MiRNA 4.0.

I can neither 100% confirm, but I believe you can 'safely' filter out probes after you have normalised. The background correction is performed per chip, using control probe information. The quantile normalisation step, then, is fit per probe-set across all chips (chip = sample).

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k