I am trying to run scRNA velocity with velocyto. A preliminary step is to run cellranger on fastq files but I have an issue with my chosen dataset
Here is what I did:
- Downloaded each SRR from GSE104323 using a loop with
fastq-dump --split-files --origfmt --gzip SRR6084134
- This only yielded one SRR file instead of 3 expected files (R1,R2,I1)
- I renamed my fastqs to:
SRR_S0_L001_R1_001.fastq.gz SRR_S1_L001_R1_001.fastq.gz ...
Then tried to run:
cellranger count --id=test --fastqs=fastqgz/ --transcriptome=refdata-gex-mm10-2020-A
which failed with error:
The read lengths are incompatible with all the chemistries for Sample SRR in "/mnt/c/Users/jobac/Downloads/SRA_split_files/GSE104323".
- read1 median length = 98
- read2 median length = 0
- index1 median length = 0
I suppose the problem is that I have only one file instead of separate R1,R2,I1. How to obtain them for this dataset or work around this issue ?
Thanks a lot for the help!
Thanks a lot ! I looked in detail at one example and found this comment by the authors:
"The Unique Molecular Identifier and cellular barcode corresponding to each read have been appended to the read id separated by an underscore."
Unfortunately I need to analyze this data specifically. It is my first time dealing with fastq, I am a bit at a loss how to split the files back. If someone finds time to do one example highlighting text and showing which parts to split back into the three files I could write a custom script
v2 barcodes should be 16 bp and UMI's will be 10 bp. It appears that the barcodes in the example above may have only been sequenced as 14 bp.
So from
CAGTGCATGGATGG -
cell barcode
CACGGATGGG -
UMI
Illumina index is likely gone for good. You could randomly use one of the valid Illumina indexes to create a fake
I1
file.