Count matrices correction before integrating single-cell RNA-seq datasets
0
0
Entering edit mode
3.3 years ago
berry ▴ 40

Hi,

I have 3 single-cell RNA-seq datasets from the same platform (10X), same type of sample, same condition, but from different labs to integrate. When I check the genes.tsv or features.tsv files, even though the high majority of the IDs match, I see some differences. For example here "ENSG00000243485" corresponds to a different gene symbol in each dataset:

data1[data1$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 MIR1302-2HG   
data2[data2$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 RP11-34P13.3 
data3[data3$ENSEMBL == "ENSG00000243485", ]
>ENSG00000243485 MIR1302-10

Or here "AL627309.1" gene corresponds to a different ENSEMBL id:

data1[data1$GeneName == "AL627309.1", ]
>ENSG00000238009 AL627309.1 
data2[data2$GeneName == "AL627309.1", ]
>0 rows
data3[data3$GeneName == "AL627309.1", ]
>ENSG00000237683 AL627309.1

How would you process these matrices?

Many thanks!

single-cell RNA-seq integration count matrix • 1.1k views
ADD COMMENT
0
Entering edit mode

Can you find out which GTF file versions were used for the different samples? Presumably, they differ, and ideally, you should reprocess all samples with the same annotation file.

ADD REPLY
0
Entering edit mode

Hi Friederike, thank you for your reply. I only have access to CellRanger outputs unfortunately.

ADD REPLY
0
Entering edit mode

Is this from a paper or a collaborator?

ADD REPLY
0
Entering edit mode

From different papers. They all used GRCh38 but I don't know about the GTF files.

ADD REPLY
1
Entering edit mode

Their fastq files are likely uploaded to SRA or ENA. If they are, I would recommend rerunning them through cell ranger with the same annotation.

ADD REPLY

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6