UMI extraction from 10X visium spatial transcriptome data
1
0
Entering edit mode
6 weeks ago
archie ▴ 130

Hello everyone

I have to analyse visium spatial transcriptome (ST) sequencing data (2 x150 bp) . I want to extract Spatial barcode and UMI from Read1 in order to reduce the read1 length from 150bp to 28 bp (16 bp Spatial Barcode and 12 bp UMI). I found one of the method "umi_tools" which has been used in various single cell studies.

Steps for barcode and UMI extraction :

1) umi_tools whitelist --stdin R1.fastq.gz \
--bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
--log2stderr > whitelist.txt;

2) umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \
--stdin R1.fastq.gz \
--stdout R1_extracted.fastq.gz \
--whitelist=whitelist.txt;


I have not done this analysis before. Please correct me if I am doing something "wrong" here. I will appreciate all the suggestions.

spatial umi_tools transcriptome UMI • 322 views
0
Entering edit mode

I'm assuming this is the 10X kit? If so you can use the unmodified fastq files as input to Space Ranger. It's already set to do the proper processing and QC for 10X data.

0
Entering edit mode

Yes its with 10X kit. I already analysed the data in spaceranger with unmodified fastq files. But mapping rate to transcriptome (< 30 % ) is very less against human genome. I was wondering if it had something to do with R1 read length.

I used the following parameters :

spaceranger count --id=S1_test1 --transcriptome='/path/refdata-gex-GRCh38-2020-A/' --fastqs='/path/S1'  --image='test.tif' --slide='XX' --area='A1'  --slidefile='XX.gpr'

• Under fastq parameters, I have given path of both R1 and R2 which also consist of L1 and L2 files.

• I also did some quality filtration, mapping rate increased from 22% to 27 % which is still very less.

I will appreciate all the suggestions.

0
Entering edit mode

Solution I found is to define the desired read1 length in spaceranger with --r1-length = 28 (or more) .

0
Entering edit mode
6 weeks ago

Two quick points:

1. Your pattern only has 10 Ns in it, but you state that your UMI is 12nt. You need to add an extra 2 Ns to it.
2. These commands won't "reduce" the length of R1 from 150 to 28nt. Instead they will reduce a 150nt read to 122nt and place the removed 28nt into the read name of both R1 and the matching read in R2.

This seems sensible if your aim is to then test the mapping of these reads to the genome with STAR or some such.