I have to analyse visium spatial transcriptome (ST) sequencing data (2 x150 bp) . I want to extract Spatial barcode and UMI from Read1 in order to reduce the read1 length from 150bp to 28 bp (16 bp Spatial Barcode and 12 bp UMI). I found one of the method "umi_tools" which has been used in various single cell studies.
Steps for barcode and UMI extraction :
1) umi_tools whitelist --stdin R1.fastq.gz \ --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \ --log2stderr > whitelist.txt; 2) umi_tools extract --bc-pattern=CCCCCCCCCCCCCCCCNNNNNNNNNN \ --stdin R1.fastq.gz \ --stdout R1_extracted.fastq.gz \ --read2-in R2.fastq.gz \ --read2-out=R2_extracted.fastq.gz \ --whitelist=whitelist.txt;
I have not done this analysis before. Please correct me if I am doing something "wrong" here. I will appreciate all the suggestions.