Hello, I came across the same problem. I downloaded a single cell RNA-seq dataset from https://bigd.big.ac.cn/gsa/ ,which is ended with "_f1.fq.gz" and "r2.fa.gz". The data came from 10xgenomics platform, however the cellranger cann't identify the "fq.gz" files. Maybe it can only identify the "fastq.gz" files. So, I'd like to ask a question, for next analysis, how to process the fq.gz files\uff1fI would appreciate it if you could help me.
If it is paired-end sequencing, the file should be: "._r1.fq.gz" and "._r2.fa.gz", not "*._f1.fq.gz".
There are no hard rules regarding the labeling of paired files, which is most likely what yours are. And if there are rules, they are not followed by everyone. In your case, these files are likely forward (r1) and reverse (r2). It is not difficult to verify this after you unpack the files and type:
head *_??.fq
If both files have similar headers except where one of them has 1 the other has 2, they are paired-end files.
Those usually stand for forward and reverse strands, respectively, in paired-end sequencing. However, I do recall a few cases from SRA where I'd stumble upon single-end sequencing files that used this convention to point to different replicates.
As suggested previously, unless you're doing this in the context of an automated pipeline, you are better off checking the files afterwards. Usually you can tell just by the headers alone.
how about this command, which is much shorter
Doesn't work on all operating systems though.
you are right, zcat is just a bash script, what it depends on is gzip
Hello, I came across the same problem. I downloaded a single cell RNA-seq dataset from https://bigd.big.ac.cn/gsa/ ,which is ended with "_f1.fq.gz" and "r2.fa.gz". The data came from 10xgenomics platform, however the cellranger cann't identify the "fq.gz" files. Maybe it can only identify the "fastq.gz" files. So, I'd like to ask a question, for next analysis, how to process the fq.gz files\uff1fI would appreciate it if you could help me.