Question

Analysis of Illumina HumanHT-12_V4

0

Entering edit mode

3.9 years ago

j_jamal96 ▴ 20

Hi, Dears… I want to analyze the data related to the Illumina HumanHT-12_V4 platform. https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-4901/

I am wondering if it is necessary to correct the background in this data? and Is it an essential step in microarray analysis? In some articles, this has not been done…

Thanks in advance

Illumina HumanHT-12_V4 Background correction • 2.0k views

ADD COMMENT • link updated 3.9 years ago by Kevin Blighe 87k • written 3.9 years ago by j_jamal96 ▴ 20

0

Entering edit mode

Yes, background correction is required; however, I need to point out that processing the Illumina HT expression array data from the public domain can be fraught with problems. I checked the data available for your study and it contains:

a single table of raw signal intensities for all samples combined into the same file.
a single table of processed signal intensities for all samples combined into the same file.

Information about processing Illumina arrays is contained in the limma manual. However, the public domain data is never typically in a format such that following the manual is an easy task. For example, look at this use-case:

A: illumina Arrays Illumina HumanHT-12 V3.0 expression beadchip reading data

If you do not feel comfortable processing the data from the raw stage, then you could just use the processed data. You will have other questions, though.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks a lot, Kevin ...

What is the difference between raw and processed data? In the processed data, has the background been normalized and corrected?

ADD REPLY • link 3.9 years ago by j_jamal96 ▴ 20

0

Entering edit mode

According to the information in the ArrayExpress records, the 'processed' data is already normalised by quantile normalisation, and the authors appear to have eliminated probes with high detection p-values (i.e., low quality probes), which serves as a pseudo background correction, I suppose. I see no information about background correction, specifically, which would normally be performed as part of the neqc method. The authors also removed outlier samples via the Median Absolute Deviation (MAD) method.

So, you could make a start with the processed data... the detection p-value columns will likely still be in that data; so, please remove them and then check the data distribution via boxplot() and hist()

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Kevin, thank you for your suggestions I tried to import the processed data to R, tried a few functions, but unfortunately found an error.

> file <- 'mRNA_microarray_processed.txt'
> file.lumi <- lumiR(file)

Error in gregexpr("\t", dataLine1)[[1]] : subscript out of bounds

> file.lumi <- lumiR("mRNA_microarray_processed.txt", sep="\t")

Error in strsplit(info[nMetaDataLines + 2], sep)[[1]] : subscript out of bounds

> x <- read.ilmn("mRNA_microarray_processed.txt")

Reading file mRNA_microarray_processed.txt ... ... Error in readGenericHeader(fname, columns = expr, sep = sep) : Specified column headings not found in file

rawset <- ArrayExpress("E-MTAB-4901")

Unpacking data files ArrayExpress: Reading pheno data from SDRF Error in which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i, : argument to 'which' is not logical In addition: Warning message: In readPhenoData(sdrf, path) : ArrayExpress: Cannot find 'Array Data File' column in SDRF. Object might not be created correctly.

Data <- getAE("E-MTAB-4901", type = "full")
Data1 <- getAE("E-MTAB-4901", type = "processed")

Unpacking data files But no data is stored in it

would you please help me to find How I can import data? And What part of this data should be used as the lmFit function object? Because I didn't see any examples of these commands

fit <- lmFit(Object, design)

I have already analyzed this data with raw data if you need additional explanation and used code I can send it to you via email …

Thank you in advance

ADD REPLY • link 3.9 years ago by j_jamal96 ▴ 20

score 0 · Answer 1 · 2020-05-10

Hey, no, those programs / packages will not recognise this data because the authors decided to add all feature data and expression data in the same file. You would have to read it into R manually, and then do your own final filterng.

I have done it for you this time so that you can learn the general process, and also keeping in mind that this array (Illumina) can prove very frustrating. Keep in mind that the data in the 'processed' file is already normalised and that, here (below), we are merely preparing the data for our own downstream analyses.

# read in the data
  mat <- read.csv('mRNA_microarray_processed.txt', sep = '\t',
    header = TRUE, row.names = 1, stringsAsFactors = FALSE)

# remove QC probes
  table(mat$QC)
  mat <- mat[mat$QC == '',]
  mat <- mat[,-1]

# set rownames:
# format will be: 'IlluminaProbeID_GeneSymbol'
  rownames(mat) <- paste0(mat$Probe_Id, '_', mat$Symbol)

# extract out detection p-value data
  pvalues <- mat[,grep('p$', colnames(mat))]

# remove detection p-values, standard deviation, and nbeads from main data
  mat <- mat[,-grep('p$|sd$|nbeads$', colnames(mat))]

# only keep probes where the detection p-value is p < 0.05 in greater than or equal to 3 out of the 6 samples
  keep <- (rowSums(pvalues < 0.05) >= 3)
  table(keep)
    keep
    FALSE  TRUE 
    25653 21569

  mat <- mat[keep,]
  pvalues <- pvalues[keep,]

# divide up the feature data and the expression data
# log [base2] transform the expression data
  featuredata <- mat[,7:ncol(mat)]
  mat <- data.matrix(log2(mat[,1:6]))

# verify that data is normalised
  par(mfrow = c(1,2))
  hist(mat)
  boxplot(mat)

Kevin