That happens quite often that R1 is missing, don't ask me why. Good thing is that often submitters provide BAM files allowing reconstruction of fastq from there. That is the case here for all four accession numbers. See for example at the bottom of here.
You can conveniently get bam files with prefetch
from the sra-toolkit:
mamba install -c bioconda sra-tools
prefetch --type bam --max-size 9999999999 -O ./ ERR6032593
Sometimes in Type
(see below) it doesn't say bam
but something like 10X Genomics bam file
, for example here. Then you can use --type TenX
with prefetch
afaik.
Once you have the BAM files and it is the BAM file from CellRanger use the bam2fastq utility from 10x to convert the bam back to fastq: https://support.10xgenomics.com/docs/bamtofastq
If the BAM was made from alternative pipelines you will probably need to do custom parsing to recreate the R1 file as technically scRNA-seq (10x) is single-end sequencing using R2 while R1 is not used for the actual alignment but CB/UMI are processed differently. You probably need to access the tags that store CB and UMI sequences and recreate R1 accordingly, putting these sequences into the read positions where either CellRanger or your processing pipelines expect them. For example, 10x Chromium 3' v3 has CB in R1 position 1-16 and the UMI at 17-28, so that is relatively easy to parse from the BAM tags (I guess, untested, never done manually myself). But then again there are probably corner cases, so be careful.
See also: https://bioinformatics.stackexchange.com/a/15523
Hmm, from https://www.ncbi.nlm.nih.gov/sra/?term=ERX5671923 -- it seems that only one FASTQ file (the 91-bp biological sequence) is available. Unfortunately, that means the barcodes and UMI sequences are not available. Therefore, it's not possible to just use that one file with any tool. You'd need to find some way to obtain the other FASTQ file.
Thank you for your answer. It corresponds exactly to the answer I was afraid of. I have to check more deeply but it looks like the same issue applies to all run of all libraries from this Fly Cell Atlas experiment :'(