Entering edit mode
21 months ago
DareDevil
★
4.3k
I have thousands of samples from TCGA
retrieved using TCGABiolinks
. I want to remove the batch effect
from the datasets. It's mentioned that batch can be detected from sample ID itself
How do we identify the batch info from the sample ID?
My ids look as follows.
TCGA-LL-A73Y-01A-11R-A33A-13
TCGA-AO-A03U-01B-21R-A10I-13
TCGA-E9-A1NH-01A-11R-A14C-13
TCGA-BH-A1EY-01A-11R-A13P-13
TCGA-AO-A1KS-01A-11R-A13P-13
TCGA-B6-A0I6-01A-11R-A035-13
TCGA-E9-A229-01A-31R-A156-13
TCGA-D8-A27H-01A-11R-A16E-13
TCGA-A2-A0EM-01A-11R-A035-13
TCGA-E2-A1II-01A-11R-A143-13
TCGA-BH-A0H3-01A-11R-A12O-13
TCGA-E2-A1IL-01A-11R-A14C-13
TCGA-BH-A0GY-01A-11R-A057-13
TCGA-BH-A0DG-01A-21R-A12O-13
I have looked at this link get information on sample ID
, but not specifically mentioned about batches.
Is it a combination of PlateId
, ShipDate
, and Tissue Source Site
or can I consider plates
or tss
as batch?