# Bioinformatic analysis of CITE-seq data

CITE-seq is a nice method of multiplexing single cell libraries using antibodies. Details here: https://cite-seq.com/

Although software exists, we found the exact methods very unclear so would like to present them here

# CITE-seq count approach

Many of these details have been adjusted from these discussions: https://github.com/Hoohm/CITE-seq-Count/issues/5

Terminology:

- hto: hashtag
- mRNA - read 2 from cellranger containing the actual transcriptome reads

# Software needed

- Cellranger - 10X genomics
- Cite-seq-count - https://github.com/Hoohm/CITE-seq-Count
- Seurat (R) - https://satijalab.org/seurat/ and in particular https://satijalab.org/seurat/v3.1/hashing_vignette.html

# General steps needed

```
- 1. We use **cellranger** as usual to create a mRNA matrix cell barcode and UMI vs mRNA. No hto information is included (as these go into the "undetermined" fastqs) !
- 1a. Run Cellranger mkfastq as usual
- 1b. Run Cellranger count as usual
- 2. We use CITE-seq-count to just create a hto matrix of cell barcode and UMI vs hashtags+polyA. No mRNA transcriptome reads are included, so the hto to transcriptome must be remapped in Seurat in step 3 below.
- 2a. CITE-seq-count using **undetermined** reads from Cellranger mkfastq (step 1a)
- 3. The resulting matrices, i.e. mRNA from 1b and hto from 2a, are **combined** in Seurat using the hashtag demux tutorial
```

# More detailed steps

```
# Step1: cellranger mkfastq using the standard 10X barcodes
# Result1: 3 fastqs (R1, R2, index) from the transcriptome
# Result2: and 3 fastqs (R1, R2, index) from the hashtags
# Step2: cellranger count
# input: 3 fastqs (R1, R2, index) from the transcriptome
# Result: digital gene-cell expression matrix. - for whitelist and counts
# Step3: CITE-seq-Count
# Input: whitelist: use the cell barcodes from the transcriptome (step 2) as a whitelist
# Input :running R1, R2 from the hashtags through CITE-seq-Count to
# Result: get a hashtag-cell matrix
## Seurat hashing vignette
# Step4: Combine results and check results in Seurat
# https://satijalab.org/seurat/v3.1/hashing_vignette.html
```

# Warnings and errors

```
[WARNING] Read1 length is 28bp but you are using 26bp for Cell and UMI barcodes combined. This might lead to wrong cell attribution and skewed umi counts.
```

10X Cellranger V2 vs V3: The UMI is 10-bp long in V2 but 12-bp in V3.

As an alternative to Seurat's vignette, I would recommend the corresponding chapter of the OSCA book that explains how to analyze the protein abundance measurements from CITE-seq data (but doesn't yet go into the details of hash tags)