I am interested in the scRNA-seq data from a previous study (http://www.ncbi.nlm.nih.gov/pubmed/31067475). They deposited the dataset under the accession “GSE134520” in GEO dataset. Since they didn`t provide the raw data generated from cellranger, I re-run the cellranger pipeline and do the subsequent process largely following the code provided by the author (http://bioinfo.au.tsinghua.edu.cn/member/pzhang/scstomach.r). Here I have a few questions:
In the paper, author claimed that they collected 55,440 cells in total and finally got 32,332 cells after filtration. However, I got more than 90,000 cells in total after reading cellranger output file under the “filtered_feature_bc_matrix” folder. Why do I get proximately twice number of cells? Furthermore, I did the filtration step and found that there are about 22,000 cells which were passed the filtration.
In the code (line 41 and 42), the author wanted to get rid of batch effects. They subseted the datasets to exclude the sample “CAN1” using “SubsetData” function and do the subsequent analysis without adding it back. Why do they exclude sample “CAN1” and not add it back?
Thanks in advance.