Hi! I have an unconventional problem and would love some experts' guidance on this. I am part of a team of undergraduates, and we have been tasked with informing the market strategy for a patented drug (Drug A) entering a new country (Country X). The prompt is as follows: A comprehensive market analysis to quantify disease prevalence, patient profiles, and treatment patterns in Country X, providing data-driven insights to inform the commercial strategy for Drug A.
We were given no datasets to tackle this, and were expected to find our own. As a bioinformatics enthusiast, I was interested in using genomics data to estimate the market size through calculating the polygenic risk scores for a sample of that ethnicity. I wanted to create a composite risk score, by combining all the diseases' PRSs against which the drug is effective. I similarly also created a composite "precaution score" by combining all the drug contraindications' PRSs. I created these composite panels by combining ones available on PGS catalog.
I found the 1000 genomes project data which contained individual-level data. However the 1KG dataset does not have phenotypic values, and so I am not able to validate the risk scores nor estimate absolute risks. I need your expert help in figuring out how I can make this analysis useful! Note that its okay for it not to be precise, as the bread and butter of our presentation will come from longitudinal epidemiological surveys done across Country X. This is just to add another layer to the presentation.
I am aware that my approach is very rough with many possible confounders, such as limited portability of PRS panels across ethnicities, sampling bias in the 1KG dataset, extremely low confidence of assertations without validation. I would just like some ideas on how I can use this analysis to spice up my group's work.
What I've done so far: Plotted score distributions of my ethnicity of interest against that of the Caucasian individuals in 1KG dataset, to show much greater genetic risk in my ethnicity of interest for all diseases the drug targets.
My dataset contains subgroups within my ethnicity of interest. I calculated subgroup-specific scores as well, to inform the company of key regions with high risk of developing the disease.
I don't see a single question here. Saying
I need your expert help in figuring out how I can make this analysis useful!
is too open-ended.As an instructor, I am very much against students soliciting the wisdom on the crowd when it comes to their assignments. This goes double because you seem to have a team, so you already have a built-in feedback for your ideas. Unless your instructor has explicitly encouraged you to ask people for ideas online, what you are doing is probably outside of expected response to this task. You already have a benefit of being able to do online search and other people around who will presumably also have testable ideas.
Thank you for the response! Unfortunately this is completely independent and not part of a course. I was asking for some innovative ways to represent the analysis I've already done, as I have taken a highly unconventional approach to tackle the problem statement at hand. None of my group mates have a background in biology or bioinformatics, and I have ran out of ideas to test. I also do not have access to an expert, which is why I decided to post on the forum.
Have a good day!