How to normalize microarray data using Python?
0
0
Entering edit mode
2.1 years ago

I have an Affymetrix microarray dataset to analyse time-dependent treatment on gene expression in different patients.

I have three types of controls:

  • positive control
  • negative control
  • probeset control

My attempt:

import qnorm
import pandas as pd


# Negative normalization: Subtract the median of the negative controls within the patient sample
neg_ctl_median = neg_ctl_df.iloc[:,-29:].median()

# Positive normalization: Divide the median of the positive controls within the patient sample
pos_ctl_median = pos_ctl_df.iloc[:,-29:].median()

# Probeset normalization (quantile normalization)
probeset_norm = qnorm.quantile_normalize(pos_ctl_df.iloc[:,-29:], axis=1)

for gene, sample in coding_gene_df.iloc[:,-29:].astype(float).iterrows():
  norm_val = sample - neg_ctl_median # Negative normalization
  norm_val = norm_val / pos_ctl_median # Positive normalization (replace sample value with the value that has already been normalized against negative control)
  norm_val = norm_val / probeset_norm

Is my code on the right track - especially the probeset normalization? Should I divide the samples by probeset mean?

The data (as dictionary)

treatment_df.head().to_dict()

{'0h_P1_T1_TimeC1_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '9.25',
  'DNAJC14': '8.44',
  'DNAJC15': '8.66',
  'DNAJC30': '7.34',
  'DNAJC9': '7.62'},
 '0h_P2_T1_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.26',
  'DNAJC14': '8.34',
  'DNAJC15': '8.63',
  'DNAJC30': '7.42',
  'DNAJC9': '7.19'},
 '0h_P3_T1_TimeC1_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.07',
  'DNAJC14': '8.67',
  'DNAJC15': '8.91',
  'DNAJC30': '7.15',
  'DNAJC9': '7.22'},
 '0h_P4_T1_TimeC1_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '8.56',
  'DNAJC14': '8.63',
  'DNAJC15': '9.29',
  'DNAJC30': '7.46',
  'DNAJC9': '7.16'},
 '0h_P5_T1_TimeC4_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.42',
  'DNAJC14': '8.32',
  'DNAJC15': '8.63',
  'DNAJC30': '7.29',
  'DNAJC9': '7.29'},
 '12h_P1_T4_TimeC2_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.58',
  'DNAJC14': '8.64',
  'DNAJC15': '8.98',
  'DNAJC30': '7.2',
  'DNAJC9': '7.23'},
 '12h_P2_T4_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.94',
  'DNAJC14': '8.38',
  'DNAJC15': '8.03',
  'DNAJC30': '7.32',
  'DNAJC9': '7.26'},
 '12h_P3_T4_TimeC2_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.27',
  'DNAJC14': '8.57',
  'DNAJC15': '8.27',
  'DNAJC30': '7.32',
  'DNAJC9': '7.39'},
 '12h_P4_T4_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.5',
  'DNAJC14': '8.25',
  'DNAJC15': '8.47',
  'DNAJC30': '6.98',
  'DNAJC9': '7.12'},
 '12h_P5_T4_TimeC2_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.18',
  'DNAJC14': '8.25',
  'DNAJC15': '8.66',
  'DNAJC30': '7.38',
  'DNAJC9': '7.6'},
 '24h_P1_T5_TimeC4_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.13',
  'DNAJC14': '8.43',
  'DNAJC15': '8.33',
  'DNAJC30': '7.41',
  'DNAJC9': '6.68'},
 '24h_P2_T5_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.85',
  'DNAJC14': '8.54',
  'DNAJC15': '8.73',
  'DNAJC30': '7.26',
  'DNAJC9': '7.49'},
 '24h_P3_T5_TimeC3_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '8.99',
  'DNAJC14': '8.42',
  'DNAJC15': '8.92',
  'DNAJC30': '7.25',
  'DNAJC9': '7.27'},
 '24h_P4_T5_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.26',
  'DNAJC14': '8.6',
  'DNAJC15': '7.62',
  'DNAJC30': '6.93',
  'DNAJC9': '7.3'},
 '24h_P5_T5_TimeC3_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.31',
  'DNAJC14': '8.14',
  'DNAJC15': '8.11',
  'DNAJC30': '7.52',
  'DNAJC9': '7.64'},
 '48h_P1_T6_TimeC3_PIDC1_Non-Survivor': {'DNAJB6 /// TMEM135': '9.05',
  'DNAJC14': '8.27',
  'DNAJC15': '7.63',
  'DNAJC30': '7.46',
  'DNAJC9': '7.42'},
 '48h_P2_T6_TimeC3_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '8.57',
  'DNAJC14': '8.23',
  'DNAJC15': '8.6',
  'DNAJC30': '7.45',
  'DNAJC9': '7.62'},
 '48h_P3_T6_TimeC3_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.23',
  'DNAJC14': '8.65',
  'DNAJC15': '7.71',
  'DNAJC30': '6.77',
  'DNAJC9': '7.26'},
 '48h_P4_T6_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.99',
  'DNAJC14': '8.7',
  'DNAJC15': '7.99',
  'DNAJC30': '7.28',
  'DNAJC9': '7.5'},
 '48h_P5_T6_TimeC3_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '8.78',
  'DNAJC14': '8.21',
  'DNAJC15': '8.02',
  'DNAJC30': '7.56',
  'DNAJC9': '7.69'},
 '4h_P1_T2_TimeC1_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.25',
  'DNAJC14': '8.69',
  'DNAJC15': '8.78',
  'DNAJC30': '7.53',
  'DNAJC9': '7.29'},
 '4h_P2_T2_TimeC2_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.49',
  'DNAJC14': '8.47',
  'DNAJC15': '8.7',
  'DNAJC30': '7.41',
  'DNAJC9': '7.23'},
 '4h_P3_T2_TimeC2_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.87',
  'DNAJC14': '8.38',
  'DNAJC15': '8.03',
  'DNAJC30': '7.29',
  'DNAJC9': '7.57'},
 '4h_P5_T2_TimeC2_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.2',
  'DNAJC14': '8.1',
  'DNAJC15': '7.9',
  'DNAJC30': '7.47',
  'DNAJC9': '7.62'},
 '8h_P1_T3_TimeC4_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.49',
  'DNAJC14': '8.58',
  'DNAJC15': '8.22',
  'DNAJC30': '7.29',
  'DNAJC9': '7.13'},
 '8h_P2_T3_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.16',
  'DNAJC14': '8.13',
  'DNAJC15': '7.88',
  'DNAJC30': '7.58',
  'DNAJC9': '7.18'},
 '8h_P3_T3_TimeC4_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '8.52',
  'DNAJC14': '8.44',
  'DNAJC15': '9.23',
  'DNAJC30': '7.38',
  'DNAJC9': '7.2'},
 '8h_P4_T3_TimeC4_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.28',
  'DNAJC14': '8.53',
  'DNAJC15': '8.22',
  'DNAJC30': '7.32',
  'DNAJC9': '6.84'},
 '8h_P5_T3_TimeC4_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.29',
  'DNAJC14': '8.41',
  'DNAJC15': '9.08',
  'DNAJC30': '7.28',
  'DNAJC9': '7.69'}}
normalization Affymetrix microarray • 1.0k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

The link you shared did not have any info about normalization

ADD REPLY

Login before adding your answer.

Traffic: 1060 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6