I downloaded paired-end Illumina reads from the NCBI-SRA, and run
fastq-dump --split-3 to get a legacy extraction of the corresponding fastq files
I ended with three files. The file_1.fastq.gz, file_2.fastq.gz and a third file.fastq.gz. The third one corresponds to 492919 files whose readlen < 1
Sizes of these fastq.gz files are huge. A simple counting of lanes takes too long to be accomplished. A test to extract and compare the order of the names and coordinates' read sequences will take even a longer time
So I rather ask here for previous experiences..
1. Should I understand that name_1.fastq and name_2.fastq are synchronized files ?, that is, are the left and right reads are in the same order ?. I ask this because the size difference between the two files (the _1 and the _2) is notable
2. Is there any script that will allow me to synchronize these two files in case that I need it?