cellranger count help
4
1
Entering edit mode
13 months ago
jsl ▴ 30

Hello,

I am trying to analyze the public dataset https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126030 I've downloaded the fastq files onto my cluster, and would like to proceed with cellranger count.

I am in a test folder and the only file is: SRR8526547_1.fastq and refdata-cellranger-GRCh38-1.2.0

cellranger count --id=cellranger \
--transcriptome=/home/jl2/scratch60/refdata-cellranger-GRCh38-1.2.0/ \
--fastqs=.\
--sample=SRR8526547_1.fastq \


I keep getting the error of

Invalid path/prefix combination: /gpfs/ycga/scratch60/k/jl2/test, ['SRR8526547_1.fastq']
No input FASTQs were found for the requested parameters.


Can't seem to figure out what's wrong. Does it need fastq.gz instead of fastq?

RNA-Seq 10x • 2.4k views
3
Entering edit mode
13 months ago
Haci ▴ 380

cellranger count expects a certain nomenclature for the fastq files, please see the last section here, "My FASTQs are not named like any of the above examples".

Basically this is how your file names should look like: [Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz.

For the Read Type, you can take a look at your fastq files with head to see what is what. The link above explains different read types.

0
Entering edit mode

Dear Haci,

Thanks for your reply. Upon closer inspection, I think the fastq files I downloaded has been modified, i.e it does not look like a normal fastq format.

@SRR8526547.1 1/1
NGGCCAGTCATGTCTTTATATAAATC
+
#AAFFJJJJJJJJJJJJJJJJJJJJJ
@SRR8526547.2 2/1
NCGCGATCACGAAAGCCTGTCACCAC
+
#AAFFJJJJJJJJJJJJJJJJJJJJJ
@SRR8526547.3 3/1
NTATCTTTCATCGCTCATCGTACACA


The head of R2 is this

@SRR8526547.1 1/2
AACTACAGAATATGCTAAACAATAGACCAAAAGAATGAAGGAGGCTAAGGAGAAACGACAGGAACAAATTGCGAAGAGACGCAGACTTTCCTCTCTGC
+
------<A<F-7----77JF7-------AF-A-F<--FJ7<A--AAAF-7<7JA-77-7F7AJAJFJJJ---FJJJFJJJFJFFJJJFAF7A-<<<7F
@SRR8526547.2 2/2
TAAAAAAAAAAAAATATTTAATTTTTGCCTTTCACAATTTCAGGAACTAATAACTTCAAATCCTTCCAATCTTATTGATGTCACATTTTTTTAAATAT
+
A<-<7F----7-7A-<----7-7---<<--7-F7-77A-7---A-7---<-A----------7-------<7-----7-----<---<-A7--7----
@SRR8526547.3 3/2
CATCTTAGTCATACGACCATAAATTAAAAGTGGAGTCACTAAATAGTTTGCAGTACGTTTCTAATATAAGTGTAGGTGGGTATCAAAACAAGACAAAT


Is there a workaround?

0
Entering edit mode

@Haci is only referring to the file name format. As posted this is normal fastq format. But since reads are probably dumped from SRA without -F option the fastq header has been modified to contain that SRR number. Your best option is to re-extract the data from SRA file with -F option or try to get the fastq files from ENA.

0
Entering edit mode

I don't think the -F option would be an issue, as it only effects the sequene/read name. --split-files, on the other hand, is critical, a typical 10x run has 2 or 3 fastq outputs, all of which are expected by cellranger count with the "right" filename (not read name) conventions.

0
Entering edit mode

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR852/007/SRR8526547/SRR8526547_1.fastq.gz


@haci After changing the name to

SRR8526547_S1_L001_R1_001.fastq SRR8526547_S1_L001_R2_001.fastq

cellranger count --id=cellranger \
--transcriptome=/home/jl24/scratch60/refdata-cellranger-GRCh38-1.2.0/ \
--fastqs=.\
--sample= SRR8526547 \


this command did not produced any error nor output file, just this:

/gpfs/ycga/apps/hpc/software/cellranger/3.1.0/cellranger-cs/3.1.0/bin
cellranger count (3.1.0)
-------------------------------------------------------------------------------

Usage:
count
--id=ID
[--fastqs=PATH]
[--sample=PREFIX]
--transcriptome=DIR
[options]
count <run_id> <mro> [options]
count -h | --help | --version


Do you think the issue is the header?

1
Entering edit mode

Do you have a space in --sample= SRR8526547 after the =? If so remove that. That directive is not needed if you have one sample. So you could try omitting it.

0
Entering edit mode

As far as I can tell, the pipeline did not start either. One thing you can check is the extension, cellranger count would expect fastq.gz, just like your original files. If that would be the error, the software would have complained with an error though!

0
Entering edit mode

I used fastq-dump --split-files to download srr, it gives me three files, (with size 1.2G, 174Mb, 390Mb), how do I know which file is which lane or left or right to rename the files to run cellranger? If I already downloaded some files without split-files, can I still use them? or I have to redownload them?

0
Entering edit mode

If this is 10x data then one of the smaller files (should be re-named R1) will contain cell barcodes+UMI. Other small file should have Illumina indexes (should be re-named I1). Final file should have the actual read data (largest, should be re-named as R2).

If you post the SRR# I can take a look. Sometimes these files are included in original/additional downloads without a need to figure out what is what.

0
Entering edit mode

@genomax, technically, R2 does not need to be the "largest" file. If longer read length is specified for R1 during the sequencing run, exceeding the cell and transcript barcodes and into the transcript, R1 can be equal to or larger than R2.

0
Entering edit mode

Fair point. I based my comment on file sizes posted by the @alan, which seem to fit normal pattern.

2
Entering edit mode
8 weeks ago
Max.Ka ▴ 30

Hello,

I have been troubleshooting

error: No input FASTQs were found for the requested parameters.

for several hours now. In my case the file names, the file path and the command were all fine. Finally what solved the issue for me was to move the fastq.gz files into a seperate folder that only contained fastq.gz files. The original folder had some other files in it (md5, fastqc output, etc.). Not sure why this was a problem for the pipeline, but make sure to give this a try if you run into similar trouble.

Best, Max

1
Entering edit mode

@Max.Ka. Many thanks for posting this. I had the same issue trying to run cellranger-atac count and was completely stumped. In a folder containing >200 fastq.gz files there was a rogue .txt file in there that prevented the the program running. My error message was this:

Invalid path/prefix combination: /scratch/c.c1477909/fq, ['14993_WGE_ATAC']

Samples not detected among FASTQs: 14993_WGE_ATAC

Completely uninformative as it only refers to the path and the fastq sample ID, the name of the text file was something completely different! That said, after reading your post, I ended up finding this note on the 10X website which mentions removing non-fastq files in point 1.

0
Entering edit mode
13 months ago
colindaven ★ 2.7k

Part of my cellranger count SLURM script, you should get the gist.

I think your problem is you're not setting the PATH to the fastqs

fq_path=/ngsssd1/rcug/N1737_N1744/200224_NB551160_0261_AHJC2TBGX9/HJC2TBGX9/outs/fastq_path
echo "Input fastq path " $fq_path # Add miniconda3 to PATH . /mnt/ngsnfs/tools/miniconda3/etc/profile.d/conda.sh # Activate env on cluster node conda activate ### Run command # Remember to check specified a) refseq b) threads ########## #transcriptome=/lager2/rcug/seqres/cellranger/refdata-cellranger-GRCh38-3.0.0 transcriptome=/lager2/rcug/seqres/cellranger/refdata-cellranger-mm10-3.0.0 cellranger count --id=$1 \
--transcriptome=$transcriptome \ --fastqs=$fq_path \
--sample=\$1 \
--expect-cells=5000 \
--localcores=28

0
Entering edit mode
13 months ago
jsl ▴ 30

Dear all,

Thank you for your contributions. Finally gotten it to work - the codes below work fine. Turns out it was the space after --sample= as genomax astutely pointed out. Hoci also made a great point about naming of the samples which must be strictly adhered.

cellranger count --id=output \
--transcriptome=/home/jl2/scratch60/refdata-cellranger-GRCh38-3.0.0/ \
--fastqs=. \
--sample=SRR8526547 \


For those who might be wondering, fastq or fastq.gz will work just fine. If you are at working directory, --fastqs=. would also work. (So far, the header of my fastq files had not produced any errors, but I'll keep updated on the output.)

Thanks a lot guys.