Generating count matrix from Chromium Fixed RNA Profiling (FRP) data outside Cell Ranger.
1
0
Entering edit mode
6 months ago
bk11 ★ 2.4k

I am wondering if someone could recommend any tools (not cellranger) to align and generate count matrix of Chromium Fixed RNA Profiling (FRP) data. Link to Method

fixed-RNA-profiling chromium cellranger scFFPE • 1.5k views
ADD COMMENT
0
Entering edit mode

As long as you have demultiplexed sample fastq files with UMI etc alevin fry would likely work?

ADD REPLY
0
Entering edit mode

Multiple samples are pooled together and each sample has specific probe barcode id (BC001 through BC008). What will be your recommendation for a nice tool to demultiplex sample fastq files? But, we can easily identify which cell originated from which sample after we have count matrix based on probe barcode. I wonder if alevin fry will work without demultiplexing these data.

ADD REPLY
0
Entering edit mode

You may want to demultiplex the data using cellranger mkfastq and then go from there. You will need access to original data folder for this to work. Otherwise you will need to demultiplex the data using additional steps.

ADD REPLY
0
Entering edit mode

This part (cellranger mkfastq) is already done. I have a pair of fastq files having 8 pooled samples in it.

ADD REPLY
0
Entering edit mode

Perhaps it was not done right? Looks like if you use the right index codes in the samplesheet, then you should have fully demultiplexed samples at end of cellranger mkfastq): https://kb.10xgenomics.com/hc/en-us/articles/4403017520653-Where-can-I-find-the-Dual-Index-Kit-TS-Set-A-sample-index-sequences-

ADD REPLY
0
Entering edit mode

That part is done correctly. I was going through github and found an issue requesting to develop function in alevin-fry process FRP data. Not sure if it is out there yet. May be ATpoint can chime in here.

ADD REPLY
0
Entering edit mode

I have not processed FRP data first hand but I would think that if that part was done correctly then you should have separate sample files at this point, not a mix of 8 samples.

ADD REPLY
0
Entering edit mode

For FRP data, Cell Ranger recommends to use cellranger multi function where it will take your fastq files ans certain parameters as input and outputs the results for individual samples in the pool.

ADD REPLY
0
Entering edit mode

I was only referring to mkfastq part to demultiplex the samples (step 1 in the process). Going by this line in link above

but the 10x Sample Index name (i.e. SI-TS-A1) is needed for demultiplexing sequencing runs when running more than one sample per lane.

The relevant set of indexes is available here: https://cdn.10xgenomics.com/raw/upload/v1655155124/support/in-line%20documents/Dual_Index_Kit_TS_Set_A.csv

If TS codes in this file had been used in the samplesheet then I would have thought that you will get each sample as a separate file.

ADD REPLY
0
Entering edit mode

We had used the Index SI-TS-A5 but different probe barcodes BC001 to BC008 (see here). Therefore only a pair of fastq files at cellranger mkfastq step and count matrices for 8 sample after cellranger multi step.

ADD REPLY
0
Entering edit mode

I see. So this is an unsupported extension of the protocol. Based on ATPoint's comment in the GitHub issue you know where the "Probe BC" is so you will need to look for 4 possible combinations to demultiplex the data further.

This functionality is probably not implemented in alevin otherwise @Rob would have marked the issue done.

ADD REPLY
0
Entering edit mode

Looks like 10x has example datasets on their web site. Will take a look to see what these BC sequences look in reality. Should be possible to bin them using seal.sh from BBMap.

Edit: From one of the test datasets it is possible to see these BC barcodes in Read 2 file. barcode

Extracting the 8 bp section led to these counts (this data is supposed to have 4 bc barcodes and there are top 16 counts that fit).

29413818 ACTTTAGG
28090524 GACACTAC
21841622 TTGCACCT
18050388 CGAATTGC
17378499 AACGGGAA
16720550 TCGTACCG
16606651 GTTCCATT
11962570 CGAGGGTA
4277340 AGTAGGCT
4225299 CACCAACG
4121690 ATGTTGAC
3892353 TGAGGTTT
3706217 GCACCAAG
3671289 GCTACCGA
3517985 CTGTACGA
3478912 TACGTTTC
 469915 TCGTCCCG
 442175 AACGGGAC
 133152 ACTTTAGT

Trick now would be to identify the 4 indexes that contribute to one bc code and from there binning/demultiplexing reads should be possible. These files can then be input into alevin fry.

ADD REPLY
0
Entering edit mode
4 months ago

What is the purpose of doing so? I tried to do that by using the bam file generated by CellRanger. One important detail if you are working with raw data. The actual sequence of the probe barcode is actually 8 sequences not only 1 sequence. For example the second barcode in your count is actually a "variant" of BC001. https://www.10xgenomics.com/support/single-cell-gene-expression-flex/documentation/steps/probe-sets/chromium-frp-probe-set-files#probe_seq_file

ADD COMMENT

Login before adding your answer.

Traffic: 1690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6