Hi,
My goal is to find known mutant EGFR and mutant KRAS scRNAseq lung cancer patient samples and visualize any possible DE between the two groups. My workflow has me downloading these individual patient samples, merging the Seurat objects, normalizing, integrating, and then clustering.
I downloaded data from this paper under accession CRA001963 and CRA001477 thinking that these would be great samples for the mutant EGFR tumor condition. I downloaded each pair of reads into a HPC and performed 10x Genomics count on these reads to generate count matrices. The web summaries outputted by these functions tell me that the counts were fine and that there were a sufficient number of UMI's and cells captured in these samples.
However, when I load the data into R and create their respective Seurat objects, I noticed that there are a low number of UMIs and gene counts for a lot of these cells. I have attached an image to show y'all what I mean. Further QCing also reveals that there aren't that many cells to begin with.
egfrlist <- c("CRR049227_EGFR_tumor_matrix", "CRR049228_EGFR_tumor_matrix", "CRR049229_EGFR_tumor_matrix", "CRR049230_EGFR_tumor_matrix", "CRR073022_EGFR_tumor_matrix", "CRR073023_EGFR_tumor_matrix", "CRR073024_EGFR_tumor_matrix" , "CRR073025_EGFR_tumor_matrix", "CRR073026_EGFR_tumor_matrix")
## Loading in EGFR data
for (i in egfrlist){
seurat_data <- Read10X(data.dir = paste0("/home/crx6xw/rstudio/", i))
seurat_obj <- CreateSeuratObject(counts = seurat_data)
assign(paste0("egfrpt", which(egfrlist == i)), seurat_obj)}
My question is is this just low-quality data or is there something wrong in how I uploaded the data or generated the count matrices? If not, would performing further cell-level filtering and normalization help with this data? I can upload images of web summaries from the count matrices.