cellranger count help
4
2
Entering edit mode
4.1 years ago
jsl ▴ 50

Hello,

I am trying to analyze the public dataset https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126030 I've downloaded the fastq files onto my cluster, and would like to proceed with cellranger count.

I am in a test folder and the only file is: SRR8526547_1.fastq and refdata-cellranger-GRCh38-1.2.0

cellranger count --id=cellranger \
                 --transcriptome=/home/jl2/scratch60/refdata-cellranger-GRCh38-1.2.0/ \
                 --fastqs=.\
                 --sample=SRR8526547_1.fastq \

I keep getting the error of

Invalid path/prefix combination: /gpfs/ycga/scratch60/k/jl2/test, ['SRR8526547_1.fastq']
No input FASTQs were found for the requested parameters.

Can't seem to figure out what's wrong. Does it need fastq.gz instead of fastq?

RNA-Seq 10x • 11k views
ADD COMMENT
3
Entering edit mode
4.1 years ago
Haci ▴ 680

cellranger count expects a certain nomenclature for the fastq files, please see the last section here, "My FASTQs are not named like any of the above examples".

Basically this is how your file names should look like: [Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz.

For the Read Type, you can take a look at your fastq files with head to see what is what. The link above explains different read types.

ADD COMMENT
0
Entering edit mode

Dear Haci,

Thanks for your reply. Upon closer inspection, I think the fastq files I downloaded has been modified, i.e it does not look like a normal fastq format.

The head of R1 is

@SRR8526547.1 1/1
NGGCCAGTCATGTCTTTATATAAATC
+
#AAFFJJJJJJJJJJJJJJJJJJJJJ
@SRR8526547.2 2/1
NCGCGATCACGAAAGCCTGTCACCAC
+
#AAFFJJJJJJJJJJJJJJJJJJJJJ
@SRR8526547.3 3/1
NTATCTTTCATCGCTCATCGTACACA

The head of R2 is this

@SRR8526547.1 1/2
AACTACAGAATATGCTAAACAATAGACCAAAAGAATGAAGGAGGCTAAGGAGAAACGACAGGAACAAATTGCGAAGAGACGCAGACTTTCCTCTCTGC
+
------<A<F-7----77JF7-------AF-A-F<--FJ7<A--AAAF-7<7JA-77-7F7AJAJFJJJ---FJJJFJJJFJFFJJJFAF7A-<<<7F
@SRR8526547.2 2/2
TAAAAAAAAAAAAATATTTAATTTTTGCCTTTCACAATTTCAGGAACTAATAACTTCAAATCCTTCCAATCTTATTGATGTCACATTTTTTTAAATAT
+
A<-<7F----7-7A-<----7-7---<<--7-F7-77A-7---A-7---<-A----------7-------<7-----7-----<---<-A7--7----
@SRR8526547.3 3/2
CATCTTAGTCATACGACCATAAATTAAAAGTGGAGTCACTAAATAGTTTGCAGTACGTTTCTAATATAAGTGTAGGTGGGTATCAAAACAAGACAAAT

Is there a workaround?

ADD REPLY
0
Entering edit mode

@Haci is only referring to the file name format. As posted this is normal fastq format. But since reads are probably dumped from SRA without -F option the fastq header has been modified to contain that SRR number. Your best option is to re-extract the data from SRA file with -F option or try to get the fastq files from ENA.

ADD REPLY
0
Entering edit mode

I don't think the -F option would be an issue, as it only effects the sequene/read name. --split-files, on the other hand, is critical, a typical 10x run has 2 or 3 fastq outputs, all of which are expected by cellranger count with the "right" filename (not read name) conventions.

ADD REPLY
0
Entering edit mode

@genomax Indeed I downloaded from ENA using

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR852/007/SRR8526547/SRR8526547_1.fastq.gz

The header is the same. Do you recommend other ways of downloading so that the header is preserved?

@haci After changing the name to

SRR8526547_S1_L001_R1_001.fastq SRR8526547_S1_L001_R2_001.fastq

cellranger count --id=cellranger \
                 --transcriptome=/home/jl24/scratch60/refdata-cellranger-GRCh38-1.2.0/ \
                 --fastqs=.\
                 --sample= SRR8526547 \

this command did not produced any error nor output file, just this:

/gpfs/ycga/apps/hpc/software/cellranger/3.1.0/cellranger-cs/3.1.0/bin
cellranger count (3.1.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Usage:
    count
        --id=ID
        [--fastqs=PATH]
        [--sample=PREFIX]
        --transcriptome=DIR
        [options]
    count <run_id> <mro> [options]
    count -h | --help | --version

Do you think the issue is the header?

ADD REPLY
2
Entering edit mode

Do you have a space in --sample= SRR8526547 after the =? If so remove that. That directive is not needed if you have one sample. So you could try omitting it.

ADD REPLY
0
Entering edit mode

As far as I can tell, the pipeline did not start either. One thing you can check is the extension, cellranger count would expect fastq.gz, just like your original files. If that would be the error, the software would have complained with an error though!

ADD REPLY
0
Entering edit mode

I used fastq-dump --split-files to download srr, it gives me three files, (with size 1.2G, 174Mb, 390Mb), how do I know which file is which lane or left or right to rename the files to run cellranger? If I already downloaded some files without split-files, can I still use them? or I have to redownload them?

ADD REPLY
0
Entering edit mode

If this is 10x data then one of the smaller files (should be re-named R1) will contain cell barcodes+UMI. Other small file should have Illumina indexes (should be re-named I1). Final file should have the actual read data (largest, should be re-named as R2).

If you post the SRR# I can take a look. Sometimes these files are included in original/additional downloads without a need to figure out what is what.

ADD REPLY
0
Entering edit mode

@genomax, technically, R2 does not need to be the "largest" file. If longer read length is specified for R1 during the sequencing run, exceeding the cell and transcript barcodes and into the transcript, R1 can be equal to or larger than R2.

ADD REPLY
0
Entering edit mode

Fair point. I based my comment on file sizes posted by the @alan, which seem to fit normal pattern.

ADD REPLY
2
Entering edit mode
3.2 years ago
Max.Ka ▴ 30

Hello,

I have been troubleshooting

error: No input FASTQs were found for the requested parameters.

for several hours now. In my case the file names, the file path and the command were all fine. Finally what solved the issue for me was to move the fastq.gz files into a seperate folder that only contained fastq.gz files. The original folder had some other files in it (md5, fastqc output, etc.). Not sure why this was a problem for the pipeline, but make sure to give this a try if you run into similar trouble.

Best, Max

ADD COMMENT
1
Entering edit mode

@Max.Ka. Many thanks for posting this. I had the same issue trying to run cellranger-atac count and was completely stumped. In a folder containing >200 fastq.gz files there was a rogue .txt file in there that prevented the the program running. My error message was this:

Invalid path/prefix combination: /scratch/c.c1477909/fq, ['14993_WGE_ATAC']

Samples not detected among FASTQs: 14993_WGE_ATAC

Completely uninformative as it only refers to the path and the fastq sample ID, the name of the text file was something completely different! That said, after reading your post, I ended up finding this note on the 10X website which mentions removing non-fastq files in point 1.

ADD REPLY
0
Entering edit mode
4.1 years ago

Part of my cellranger count SLURM script, you should get the gist.

I think your problem is you're not setting the PATH to the fastqs

fq_path=/ngsssd1/rcug/N1737_N1744/200224_NB551160_0261_AHJC2TBGX9/HJC2TBGX9/outs/fastq_path
echo "Input fastq path " $fq_path


# Add miniconda3 to PATH
. /mnt/ngsnfs/tools/miniconda3/etc/profile.d/conda.sh

# Activate env on cluster node
conda activate

### Run command
# Remember to check specified a) refseq b) threads
##########

#transcriptome=/lager2/rcug/seqres/cellranger/refdata-cellranger-GRCh38-3.0.0
transcriptome=/lager2/rcug/seqres/cellranger/refdata-cellranger-mm10-3.0.0

cellranger count --id=$1 \
                 --transcriptome=$transcriptome \
                 --fastqs=$fq_path \
                 --sample=$1 \
                 --expect-cells=5000 \
                 --localcores=28
ADD COMMENT
0
Entering edit mode
4.1 years ago
jsl ▴ 50

Dear all,

Thank you for your contributions. Finally gotten it to work - the codes below work fine. Turns out it was the space after --sample= as genomax astutely pointed out. Hoci also made a great point about naming of the samples which must be strictly adhered.

cellranger count --id=output \
                 --transcriptome=/home/jl2/scratch60/refdata-cellranger-GRCh38-3.0.0/ \
                 --fastqs=. \
                 --sample=SRR8526547 \

For those who might be wondering, fastq or fastq.gz will work just fine. If you are at working directory, --fastqs=. would also work. (So far, the header of my fastq files had not produced any errors, but I'll keep updated on the output.)

Thanks a lot guys.

ADD COMMENT

Login before adding your answer.

Traffic: 2471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6