Question: Affymetrix miRNA 4.0 Normalization
0
gravatar for gracie
19 months ago by
gracie20
gracie20 wrote:

I am analyzing a GeneChip miRNA 4.0 Array dataset. here is my code;

library(oligo)
library(pd.mirna.4.0)
celFiles <- list.celfiles( full.names=TRUE)
rawData <- read.celfiles(celFiles, pkgname="pd.mirna.4.0")
eset <- rma(rawData)

This gives results for all organism. Is there any way to do the analysis only for human miRNAs?

I also tried to do with affymetrix transcription analysis software only for human (RMA + DABG) but at the very end it gives me an error;

Analysis Failed: An error occurred while reading limma-output.h5: The type INTEGER for the dataSetName group_expressed_10 is not supported.

I could not solve it as well.

mirna microarray • 896 views
ADD COMMENTlink modified 18 months ago • written 19 months ago by gracie20

Thank you, I will go with that most probably.

ADD REPLYlink written 18 months ago by gracie20
1
gravatar for regmkbl
19 months ago by
regmkbl66k
regmkbl66k wrote:

Edit: scroll down for answer

----------------------------

Can you provide the full name of the microarray that you used? - it should only target one species. So, what do you mean by this:

Is there any way to do the analysis only for human miRNAs?

ADD COMMENTlink modified 18 months ago • written 19 months ago by regmkbl66k

Sure, GeneChip miRNA 4.0 it is. There are probes for human mouse and rat. Since using all probes would change RMA and limma results I thought I should use only human probes.

ADD REPLYlink written 18 months ago by gracie20

Hey, thanks for the link. In fact, it has micro RNAs (miRNAs) for all of these:

awk '!/^#/{print}' miRNA-4_0-st-v1.annotations.20160922.csv | cut -f6 -d, | sort | uniq -c
    131 "---"
      7 "Acacia auriculiformis"
      3 "Acacia mangium"
    103 "Acyrthosiphon pisum"
    124 "Aedes aegypti"
      2 "Aegilops tauschii"
     16 "Amphimedon queenslandica"
    416 "Anolis carolinensis"
     65 "Anopheles gambiae"
    222 "Apis mellifera"
     45 "Aquilegia caerulea"
    384 "Arabidopsis lyrata"
    337 "Arabidopsis thaliana"
     32 "Arachis hypogaea"
     19 "Artibeus jamaicensis"
    189 "Ascaris suum"
     54 "Ateles geoffroyi"
      3 "Avicennia marina"
      1 "Bandicoot papillomatosis carcinomatosis virus type 1"
      1 "Bandicoot papillomatosis carcinomatosis virus type 2"
      2 "BK polyomavirus"
    567 "Bombyx mori"
    783 "Bos taurus"
     12 "Bovine herpesvirus 1"
     10 "Bovine leukemia virus"
    464 "Brachypodium distachyon"
    173 "Branchiostoma belcheri"
    187 "Branchiostoma floridae"
     92 "Brassica napus"
      7 "Brassica oleracea"
     43 "Brassica rapa"
    108 "Brugia malayi"
      4 "Bruguiera cylindrica"
      4 "Bruguiera gymnorhiza"
    152 "Caenorhabditis brenneri"
    165 "Caenorhabditis briggsae"
    368 "Caenorhabditis elegans"
    182 "Caenorhabditis remanei"
    291 "Canis familiaris"
    134 "Capitella teleta"
     81 "Carica papaya"
      2 "Cerebratulus lacteus"
     85 "Chlamydomonas reinhardtii"
    550 "Ciona intestinalis"
     25 "Ciona savignyi"
      5 "Citrus clementine"
      4 "Citrus reticulata"
     64 "Citrus sinensis"
      6 "Citrus trifoliata"
    307 "Cricetulus griseus"
    120 "Cucumis melo"
     93 "Culex quinquefasciatus"
      5 "Cunninghamia lanceolata"
     57 "Cynara cardunculus"
    146 "Cyprinus carpio"
    255 "Danio rerio"
     45 "Daphnia pulex"
     20 "Dictyostelium discoideum"
     13 "Digitalis purpurea"
     75 "Drosophila ananassae"
     78 "Drosophila erecta"
     72 "Drosophila grimshawi"
    426 "Drosophila melanogaster"
     71 "Drosophila mojavensis"
     69 "Drosophila persimilis"
    273 "Drosophila pseudoobscura"
     76 "Drosophila sechellia"
    178 "Drosophila simulans"
     74 "Drosophila virilis"
     72 "Drosophila willistoni"
     75 "Drosophila yakuba"
     33 "Duck enteritis virus"
     26 "Echinococcus granulosus"
     22 "Echinococcus multilocularis"
     52 "Ectocarpus siliculosus"
      6 "Elaeis guineensis"
     44 "Epstein Barr virus"
    360 "Equus caballus"
     15 "Festuca arundinacea"
    108 "Fugu rubripes"
    996 "Gallus gallus"
      1 "Glottidia pyramidata"
    554 "Glycine max"
     13 "Glycine soja"
    317 "Gorilla gorilla"
      1 "Gossypium arboreum"
      1 "Gossypium herbaceum"
     80 "Gossypium hirsutum"
      4 "Gossypium raimondii"
    194 "Haemonchus contortus"
      5 "Haliotis rufescens"
      8 "Helianthus annuus"
      3 "Helianthus argophyllus"
      3 "Helianthus ciliaris"
      2 "Helianthus exilis"
      3 "Helianthus paradoxus"
      3 "Helianthus petiolaris"
     16 "Helianthus tuberosus"
     97 "Heliconius melpomene"
     15 "Herpes B virus"
     27 "Herpes Simplex Virus 1"
     24 "Herpes Simplex Virus 2"
     28 "Herpesvirus of turkeys"
      6 "Herpesvirus saimiri strain A11"
     28 "Hevea brasiliensis"
     37 "Hippoglossus hippoglossus"
   6631 "Homo sapiens"
    ...

I got this from the file labeled 'Current NetAffx Annotation Files: MiRNA-4_0 Annotations, CSV format' on the page to which you linked.

I find it odd that Affymetrix / Thermofisher would bundle all of these miRNAs on the same chip. You can use the NetAffx file to obtain the human only miRNAs. They are in column 4, but you may also need column 1 to match to you CEL files:

awk '!/^#/{print}' miRNA-4_0-st-v1.annotations.20160922.csv | cut -f1,4,6 -d, | grep -e "Homo sapiens" | head -10
"20500112","hsa-let-7a-5p","Homo sapiens"
"20500113","hsa-let-7a-3p","Homo sapiens"
"20500114","hsa-let-7a-2-3p","Homo sapiens"
"20500115","hsa-let-7b-5p","Homo sapiens"
"20500116","hsa-let-7b-3p","Homo sapiens"
"20500117","hsa-let-7c-5p","Homo sapiens"
"20500118","hsa-let-7c-3p","Homo sapiens"
"20500119","hsa-let-7d-5p","Homo sapiens"
"20500120","hsa-let-7d-3p","Homo sapiens"
"20500121","hsa-let-7e-5p","Homo sapiens"
ADD REPLYlink modified 18 months ago • written 18 months ago by regmkbl66k

Thank you for your answer but I could not find a way to filter rawdata with probeIDs. When I extract the row names with probeNames function (probeNames(rawData), it has 346085 probes however array expression data (rawData@assayData$exprs) has 292681 rows.

ADD REPLYlink written 18 months ago by gracie20

Yes, because multiple probes will be summarised into probe-sets during normalisation. That is, multiple probes will target the same, for example, exon of a target gene.

A further summarisation is given by the target parameter that is passed to rma() - take a look at my previous answer, here: C: Human Exon array probeset to gene-level expression

What are the rownames of both the raw and then the normalised data?

rownames(eset) and rownames(rawData) should access the row names.

ADD REPLYlink written 18 months ago by regmkbl66k

Thank you. The reason I want to extract human only probes is I thought rma results would differ between rma with all probesets vs only human.

    > rownames(rawData)
   [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"   "19"   "20"   "21"   "22"   "23"  
  [24] "24"   "25"   "26"   "27"   "28"   "29"   "30"   "31"   "32"   "33"   "34"   "35"   "36"   "37"   "38"   "39"   "40"   "41"   "42"   "43"   "44"   "45"   "46"  
  [47] "47"   "48"   "49"   "50"   "51"   "52"   "53"   "54"   "55"   "56"   "57"   "58"   "59"   "60"   "61"   "62"   "63"   "64"   "65"   "66"   "67"   "68"   "69"  
  [70] "70"   "71"   "72"   "73"   "74"   "75"   "76"   "77"   "78"   "79"   "80"   "81"   "82"   "83"   "84"   "85"   "86"   "87"   "88"   "89"   "90"   "91"   "92"  
  [93] "93"   "94"   "95"   "96"   "97"   "98"   "99"   "100"  "101"  "102"  "103"  "104"  "105"  "106"  "107"  "108"  "109"  "110"  "111"  "112"  "113"  "114"  "115" 



 > rownames(eset)
       [1] "14q0_st"                 "14qI-1_st"               "14qI-1_x_st"             "14qI-2_st"               "14qI-3_x_st"             "14qI-4_st"              
       [7] "14qI-4_x_st"             "14qI-5_st"               "14qI-6_st"               "14qI-7_st"               "14qI-8_st"               "14qI-8_x_st"            
      [13] "14qI-9_x_st"             "14qII-1_st"              "14qII-1_x_st"            "14qII-10_st"             "14qII-11_st"             "14qII-12_st"            
      [19] "14qII-12_x_st"           "14qII-13_st"             "14qII-14_st"             "14qII-14_x_st"           "14qII-15_x_st"           "14qII-16_st"
ADD REPLYlink written 18 months ago by gracie20

if you look at the output of str(rawData), can you see any variable that may contain the probe IDs?

ADD REPLYlink modified 18 months ago • written 18 months ago by regmkbl66k

As much as I know it should be in featureData part but it is empty it only says annotated dataframe.

> str(rawData)
Formal class 'ExpressionFeatureSet' [package "oligoClasses"] with 9 slots
  ..@ manufacturer     : chr "Affymetrix"
  ..@ intensityFile    : chr NA
  ..@ assayData        :<environment: 0x000000003016cf68> 
  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 1 obs. of  2 variables:
  .. .. .. ..$ labelDescription: chr "Index"
  .. .. .. ..$ channel         : Factor w/ 2 levels "exprs","_ALL_": 2
  .. .. ..@ data             :'data.frame': 4 obs. of  1 variable:
  .. .. .. ..$ index: int [1:4] 1 2 3 4
  .. .. ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
  .. .. .. ..$ labelDescription: chr(0) 
  .. .. ..@ data             :'data.frame': 292681 obs. of  0 variables
  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
  .. .. .. .. ..@ .Data:List of 1
  .. .. .. .. .. ..$ : int [1:3] 1 1 0
ADD REPLYlink written 18 months ago by gracie20

I have done this previously but using a probe annotation file that was available at the Affymetrix / Thermofisher website - there does not appear to be such a file available for MiRNA 4.0.

I can neither 100% confirm, but I believe you can 'safely' filter out probes after you have normalised. The background correction is performed per chip, using control probe information. The quantile normalisation step, then, is fit per probe-set across all chips (chip = sample).

ADD REPLYlink modified 18 months ago • written 18 months ago by regmkbl66k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2042 users visited in the last hour