Please bear with me as I am relatively inexperienced to single-cell RNAseq data and downloading data from the SRA run selector.
I am interested in downloading mutant EGFR and mutant KRAS patient data from this article under this BioProject accession number PRJNA591860. Ideally, I hope to download these patients, convert them into Seurat objects, merge them based on condition (in this case mEGFR vs mKRAS), integrate the samples, and visualize DE between the two groups.
Now, from my understanding, because I'm not interested in the entire dataset (only interested in mutant EGFR/KRAS), I should individually download the read data and run it through the workflow to generate individual count matrices and so on... I found the raw reads on SRA under accession SRP238929. And based on the patient demographics the author uploaded, each sample name on the spread sheet should correspond to isolate number on the SRA download page.
Each isolate has around 50 files associated with its patient. So my question is what does this necessarily mean? Does this mean 50 reads associated with that patient? Do I need to download all of these files? If I'm demultiplexing with bclfastq2 and aligning with STAR2 how would that work?
Is there also an easier way of getting this data into the workflow (ie downloading count matrices themselves as opposed to remaking them, couldn't find it anywhere)?