My question is about single-cell rnaseq, but I believe people with experience with bulk RNA-seq might also be able to answer this.
I aligned a few single cell datasets with cellranger, but when I checked the results, it seems that although most reads aligned to to the genome (only half with high confidence), only 40% of the reads aligned with the transcriptome. Here is an example of one of the outputs:
Reads Mapped to Genome 96.6% Reads Mapped Confidently to Genome 54.6% Reads Mapped Confidently to Intergenic Regions 8.5% Reads Mapped Confidently to Intronic Regions 0.2% Reads Mapped Confidently to Exonic Regions 46.0% Reads Mapped Confidently to Transcriptome 45.3% Reads Mapped Antisense to Gene 0.4%
I am not sure what to think about this. Could this be a sign of low integrity of the reads? My hypothesis is that this if there is degradation in the sample, it could have not aligned as a trasncript, but it shouldn't have any problem aligning with the genome. Another hypothesis is that the sample was contaminated with genomic DNA. I am, however, not even sure if these results are normal.
Is that so bad? I have had data of (what I think) good quality with similar mapping rates, though I used Alevin and not CellRanger. Others may comment as well, but I would not get too much of a headache here, just continue and check whether data are ok and can be analysed. Check the usual QC metrics and if you get cluster separation matching biological expectations. See whether you get the usual amount of genes detected per cell (that obviously depends on the celltype, but very generally, like 1000 per cell and more).