Hello everyone.
My question is about sing-cell RNASeq. I am re-analyzing a scRNA raw data in my lab, which has previously analyzed by seqencing company, i am trying to replicate the results and update/optimize my pipeline. Currently my pipeline is as follows:
1. Creating custom reference
I indexed the reference genome and annotation, same with the reference used before. The UTR info has been included, and non-polyA elements has been excluded, left coding protein sequence (with added UTR) only. The command is
cellranger mkref --genome A --fasta /data/user/Genome/A.fa --genes /data/user/Genome/A.gtf --memgb 50 --nthreads 10
2. QC and quantification of scRNA
cellranger count --id=MySample --transcriptome=A --fastqs=/data/user/scRNA/MySample --sample=MySample --r1-length=28 --r2-length=91 --localcores=8 --localmem=64 --nosecondary
Then I got the webSummary.html generated by cellranger, and one metric (Reads Mapped Confidently to Transcriptome) is substentially different from the one in company's web summary report. Here are some metrics in the web summary:
>Metric: from company / from mine
Estimated Number of cells : 13151 / 12999
Mean Reads Per Cell : 25092 / 25385
Median Genes Per Cell: 4771 / 4539
Many others but with similar values ...
Reads Mapped Confidently to Transcriptome: 79.29% / 65.1%
As far as I know, the annotation information should intrgrate UTR sequence into exon, as indicated here. And it seems no filter needed to be done to the raw seqence fastq before using cellranger count.
I have no idea about it then. So what goes wrong in my pipeline or I miss some pre-processing steps?
Appreciate any suggestions and discussions.