Entering edit mode
2.1 years ago
melissachua90
▴
70
I have an Affymetrix microarray dataset to analyse time-dependent treatment on gene expression in different patients.
I have three types of controls:
- positive control
- negative control
- probeset control
My attempt:
import qnorm
import pandas as pd
# Negative normalization: Subtract the median of the negative controls within the patient sample
neg_ctl_median = neg_ctl_df.iloc[:,-29:].median()
# Positive normalization: Divide the median of the positive controls within the patient sample
pos_ctl_median = pos_ctl_df.iloc[:,-29:].median()
# Probeset normalization (quantile normalization)
probeset_norm = qnorm.quantile_normalize(pos_ctl_df.iloc[:,-29:], axis=1)
for gene, sample in coding_gene_df.iloc[:,-29:].astype(float).iterrows():
norm_val = sample - neg_ctl_median # Negative normalization
norm_val = norm_val / pos_ctl_median # Positive normalization (replace sample value with the value that has already been normalized against negative control)
norm_val = norm_val / probeset_norm
Is my code on the right track - especially the probeset normalization? Should I divide the samples by probeset mean?
The data (as dictionary)
treatment_df.head().to_dict()
{'0h_P1_T1_TimeC1_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '9.25',
'DNAJC14': '8.44',
'DNAJC15': '8.66',
'DNAJC30': '7.34',
'DNAJC9': '7.62'},
'0h_P2_T1_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.26',
'DNAJC14': '8.34',
'DNAJC15': '8.63',
'DNAJC30': '7.42',
'DNAJC9': '7.19'},
'0h_P3_T1_TimeC1_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.07',
'DNAJC14': '8.67',
'DNAJC15': '8.91',
'DNAJC30': '7.15',
'DNAJC9': '7.22'},
'0h_P4_T1_TimeC1_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '8.56',
'DNAJC14': '8.63',
'DNAJC15': '9.29',
'DNAJC30': '7.46',
'DNAJC9': '7.16'},
'0h_P5_T1_TimeC4_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.42',
'DNAJC14': '8.32',
'DNAJC15': '8.63',
'DNAJC30': '7.29',
'DNAJC9': '7.29'},
'12h_P1_T4_TimeC2_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.58',
'DNAJC14': '8.64',
'DNAJC15': '8.98',
'DNAJC30': '7.2',
'DNAJC9': '7.23'},
'12h_P2_T4_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.94',
'DNAJC14': '8.38',
'DNAJC15': '8.03',
'DNAJC30': '7.32',
'DNAJC9': '7.26'},
'12h_P3_T4_TimeC2_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.27',
'DNAJC14': '8.57',
'DNAJC15': '8.27',
'DNAJC30': '7.32',
'DNAJC9': '7.39'},
'12h_P4_T4_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.5',
'DNAJC14': '8.25',
'DNAJC15': '8.47',
'DNAJC30': '6.98',
'DNAJC9': '7.12'},
'12h_P5_T4_TimeC2_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.18',
'DNAJC14': '8.25',
'DNAJC15': '8.66',
'DNAJC30': '7.38',
'DNAJC9': '7.6'},
'24h_P1_T5_TimeC4_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.13',
'DNAJC14': '8.43',
'DNAJC15': '8.33',
'DNAJC30': '7.41',
'DNAJC9': '6.68'},
'24h_P2_T5_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.85',
'DNAJC14': '8.54',
'DNAJC15': '8.73',
'DNAJC30': '7.26',
'DNAJC9': '7.49'},
'24h_P3_T5_TimeC3_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '8.99',
'DNAJC14': '8.42',
'DNAJC15': '8.92',
'DNAJC30': '7.25',
'DNAJC9': '7.27'},
'24h_P4_T5_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.26',
'DNAJC14': '8.6',
'DNAJC15': '7.62',
'DNAJC30': '6.93',
'DNAJC9': '7.3'},
'24h_P5_T5_TimeC3_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.31',
'DNAJC14': '8.14',
'DNAJC15': '8.11',
'DNAJC30': '7.52',
'DNAJC9': '7.64'},
'48h_P1_T6_TimeC3_PIDC1_Non-Survivor': {'DNAJB6 /// TMEM135': '9.05',
'DNAJC14': '8.27',
'DNAJC15': '7.63',
'DNAJC30': '7.46',
'DNAJC9': '7.42'},
'48h_P2_T6_TimeC3_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '8.57',
'DNAJC14': '8.23',
'DNAJC15': '8.6',
'DNAJC30': '7.45',
'DNAJC9': '7.62'},
'48h_P3_T6_TimeC3_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.23',
'DNAJC14': '8.65',
'DNAJC15': '7.71',
'DNAJC30': '6.77',
'DNAJC9': '7.26'},
'48h_P4_T6_TimeC3_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '8.99',
'DNAJC14': '8.7',
'DNAJC15': '7.99',
'DNAJC30': '7.28',
'DNAJC9': '7.5'},
'48h_P5_T6_TimeC3_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '8.78',
'DNAJC14': '8.21',
'DNAJC15': '8.02',
'DNAJC30': '7.56',
'DNAJC9': '7.69'},
'4h_P1_T2_TimeC1_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.25',
'DNAJC14': '8.69',
'DNAJC15': '8.78',
'DNAJC30': '7.53',
'DNAJC9': '7.29'},
'4h_P2_T2_TimeC2_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.49',
'DNAJC14': '8.47',
'DNAJC15': '8.7',
'DNAJC30': '7.41',
'DNAJC9': '7.23'},
'4h_P3_T2_TimeC2_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.87',
'DNAJC14': '8.38',
'DNAJC15': '8.03',
'DNAJC30': '7.29',
'DNAJC9': '7.57'},
'4h_P5_T2_TimeC2_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.2',
'DNAJC14': '8.1',
'DNAJC15': '7.9',
'DNAJC30': '7.47',
'DNAJC9': '7.62'},
'8h_P1_T3_TimeC4_PIDC4_Non-Survivor': {'DNAJB6 /// TMEM135': '8.49',
'DNAJC14': '8.58',
'DNAJC15': '8.22',
'DNAJC30': '7.29',
'DNAJC9': '7.13'},
'8h_P2_T3_TimeC2_PIDC2_Survivor': {'DNAJB6 /// TMEM135': '9.16',
'DNAJC14': '8.13',
'DNAJC15': '7.88',
'DNAJC30': '7.58',
'DNAJC9': '7.18'},
'8h_P3_T3_TimeC4_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '8.52',
'DNAJC14': '8.44',
'DNAJC15': '9.23',
'DNAJC30': '7.38',
'DNAJC9': '7.2'},
'8h_P4_T3_TimeC4_PIDC1_Survivor': {'DNAJB6 /// TMEM135': '9.28',
'DNAJC14': '8.53',
'DNAJC15': '8.22',
'DNAJC30': '7.32',
'DNAJC9': '6.84'},
'8h_P5_T3_TimeC4_PIDC3_Survivor': {'DNAJB6 /// TMEM135': '9.29',
'DNAJC14': '8.41',
'DNAJC15': '9.08',
'DNAJC30': '7.28',
'DNAJC9': '7.69'}}
Affymetrix Microarray Analysis In Python
The link you shared did not have any info about normalization