We came across a project in our lab that no one exactly knows how to approach. Since I know a little bit of Python programming, this project was assigned to me.
There is a data from a randomised controlled clinical trial with 60 participants. Half (30) are control and the other half are actual patients (case). From each group, biopsy samples were taken before (pre) and after (post) treatment with drug A that targets cell type B in the tissue.
In each biopsy sample, the B cells were detected and classified either as “normal” or “malignant”.
Associated metadata for each biopsy are displayed here (I have only included 12 records for this file):
name patient_id arm treatment 111 0 control pre 112 1 control pre 113 2 control pre 121 0 control post 122 1 control post 123 2 control post 211 75 case pre 212 76 case pre 213 77 case pre 221 75 case post 222 76 case post 223 77 case post
name: spreadsheet file name
patient_id: patient identity number
arm: trial arm (‘case’ or ‘control)
treatment: treatment condition (‘pre’ or ‘post’ treatment)
Other files (I have only included 20 records for each file: 10 normal and 10 malignant) contain cell detection results from a single biopsy, and each row in a spreadsheet represents an individual detected cell:
x: x coordinate
y: y coordinate
label: classified label (‘normal’ or ‘malignant’)
Just showing an example, file 111 looks like this:
x y label 730 724 normal 1962 450 normal 1511 817 normal 1244 455 normal 2529 397 normal 1878 262 normal 2248 369 normal 2007 273 normal 1531 878 normal 1729 834 normal 931 1270 malignant 1282 314 malignant 1630 839 malignant 1543 460 malignant 2493 237 malignant 1311 744 malignant 1999 366 malignant 737 1361 malignant 2252 448 malignant 2620 398 malignant
The rest can be found here, but probably they will not be necessary: https://www.mediafire.com/file/ka7r59kf0swnbnd/OtherFiles.rar/file
I am trying to answer 3 questions here (if you can think of other questions, please let me know):
- identifying post-treatment morphological changes due to the effect of drug A.
- proposing measures to quantify the changes.
- Using appropriate statistical analysis tools to provide insight whether the changes are due to chance or not.
The linchpin is what features you are interested in; can you describe them in plain English? Something like "how many of the cells are malignant" or "how interspersed malignant cells are with normal cells" or "how diffuse malignant cells are from each other"?
Had not thought about these questions.
Are the x and y coordinates important for calculating i.e. some general area/density of cells or something like that? Otherwise you have few data columns: patient_id, arm, treatement, cells_norm, cells_malignant, cells_total
Assuming the control arm patients also have biopsies at two time points you can estimate variation due to a fact that biopsies are from different spots and sometime apart.