Let me show you an example: https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR16093385&display=metadata
This data contains two reads, R1 and R2. The read length of R1 and R2 are the same 150bp.
However, this experiment is performed following 10x 3'library protocol. In the method section, it described as below:
The scRNA-seq libraries were generated using the 10x Genomics Chromium Controller Instrument and Chromium Single Cell 30 V3 Reagent Kits (10x Genomics). Briefly, cells were concentrated to 1,000 cells/mL and approximately 8,000–10,000 cells were loaded into each channel to generate single-cell gel bead-in-emulsions (GEM), which resulted in the expected mRNA barcoding of 3,000–8,000 single cells for each sample. After the reverse transcription step, GEMs were broken and barcoded cDNA was purified and amplified. The amplified barcoded cDNA was fragmented, A-tailed, ligated with adaptors and index PCR amplified. The final libraries were quantified using a Qubit High Sensitivity DNA assay (Thermo Fisher Scientific) and the size distribution of these libraries was determined by a High Sensitivity DNA chip on a Bioanalyzer 2200 (Agilent). All libraries were then sequenced by an Illumina sequencer (Illumina) on a 150 bp paired-end run.
Generally, fastq files from 10x 3' library should be I1, R1 and R2. The R1 only contains UMI and barcode, hence the length of R1 is far less than R2. According to this paper, they generated the double strand cDNA, in which both strands have UMI and barcode (I think? ). It seems to be reasonable to generate two fastq files that have equal read length like a pair-end sequencing data.
When downloading such file either from SRA or ENA, I always get these two fastq. I think the index, UMI and barcode should be in the reads. But I don't know how to extract them and split the SRA or fastq file to the default format of 10x scRNA-seq fastq.
When looking up original data stored in AWS, the filename is not a normal format for 10x 3' library fastq.
BTW, the example I provided here is not the only case. I have found this issue in another dataset. It's so strange and confused.