Count matrices correction before integrating single-cell RNA-seq datasets

0

Entering edit mode

3.3 years ago

berry ▴ 40

Hi,

I have 3 single-cell RNA-seq datasets from the same platform (10X), same type of sample, same condition, but from different labs to integrate. When I check the genes.tsv or features.tsv files, even though the high majority of the IDs match, I see some differences. For example here "ENSG00000243485" corresponds to a different gene symbol in each dataset:

data1[data1$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 MIR1302-2HG   
data2[data2$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 RP11-34P13.3 
data3[data3$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 MIR1302-10

Or here "AL627309.1" gene corresponds to a different ENSEMBL id:

data1[data1$GeneName == "AL627309.1", ]
>ENSG00000238009 AL627309.1 
data2[data2$GeneName == "AL627309.1", ]
>0 rows
data3[data3$GeneName == "AL627309.1", ]
>ENSG00000237683 AL627309.1

How would you process these matrices?

Many thanks!

single-cell RNA-seq integration count matrix • 1.1k views

ADD COMMENT • link 3.3 years ago by berry ▴ 40

0

Entering edit mode

Can you find out which GTF file versions were used for the different samples? Presumably, they differ, and ideally, you should reprocess all samples with the same annotation file.

ADD REPLY • link 3.3 years ago by Friederike 8.9k

0

Entering edit mode

Hi Friederike, thank you for your reply. I only have access to CellRanger outputs unfortunately.

ADD REPLY • link 3.3 years ago by berry ▴ 40

0

Entering edit mode

Is this from a paper or a collaborator?

ADD REPLY • link 3.3 years ago by rpolicastro 13k

0

Entering edit mode

From different papers. They all used GRCh38 but I don't know about the GTF files.

ADD REPLY • link 3.3 years ago by berry ▴ 40

1

Entering edit mode

Their fastq files are likely uploaded to SRA or ENA. If they are, I would recommend rerunning them through cell ranger with the same annotation.

ADD REPLY • link 3.3 years ago by rpolicastro 13k

Login before adding your answer.