Question: cellranger count help
0
gravatar for jsl
5 months ago by
jsl20
jsl20 wrote:

Hello,

I am trying to analyze the public dataset https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126030 I've downloaded the fastq files onto my cluster, and would like to proceed with cellranger count.

I am in a test folder and the only file is: SRR8526547_1.fastq and refdata-cellranger-GRCh38-1.2.0

cellranger count --id=cellranger \
                 --transcriptome=/home/jl2/scratch60/refdata-cellranger-GRCh38-1.2.0/ \
                 --fastqs=.\
                 --sample=SRR8526547_1.fastq \

I keep getting the error of

Invalid path/prefix combination: /gpfs/ycga/scratch60/k/jl2/test, ['SRR8526547_1.fastq']
No input FASTQs were found for the requested parameters.

Can't seem to figure out what's wrong. Does it need fastq.gz instead of fastq?

rna-seq 10x • 830 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by jsl20
3
gravatar for Haci
5 months ago by
Haci370
Haci370 wrote:

cellranger count expects a certain nomenclature for the fastq files, please see the last section here, "My FASTQs are not named like any of the above examples".

Basically this is how your file names should look like: [Sample Name]_S1_L00[Lane Number]_[Read Type]_001.fastq.gz.

For the Read Type, you can take a look at your fastq files with head to see what is what. The link above explains different read types.

ADD COMMENTlink written 5 months ago by Haci370

Dear Haci,

Thanks for your reply. Upon closer inspection, I think the fastq files I downloaded has been modified, i.e it does not look like a normal fastq format.

The head of R1 is

@SRR8526547.1 1/1
NGGCCAGTCATGTCTTTATATAAATC
+
#AAFFJJJJJJJJJJJJJJJJJJJJJ
@SRR8526547.2 2/1
NCGCGATCACGAAAGCCTGTCACCAC
+
#AAFFJJJJJJJJJJJJJJJJJJJJJ
@SRR8526547.3 3/1
NTATCTTTCATCGCTCATCGTACACA

The head of R2 is this

@SRR8526547.1 1/2
AACTACAGAATATGCTAAACAATAGACCAAAAGAATGAAGGAGGCTAAGGAGAAACGACAGGAACAAATTGCGAAGAGACGCAGACTTTCCTCTCTGC
+
------<A<F-7----77JF7-------AF-A-F<--FJ7<A--AAAF-7<7JA-77-7F7AJAJFJJJ---FJJJFJJJFJFFJJJFAF7A-<<<7F
@SRR8526547.2 2/2
TAAAAAAAAAAAAATATTTAATTTTTGCCTTTCACAATTTCAGGAACTAATAACTTCAAATCCTTCCAATCTTATTGATGTCACATTTTTTTAAATAT
+
A<-<7F----7-7A-<----7-7---<<--7-F7-77A-7---A-7---<-A----------7-------<7-----7-----<---<-A7--7----
@SRR8526547.3 3/2
CATCTTAGTCATACGACCATAAATTAAAAGTGGAGTCACTAAATAGTTTGCAGTACGTTTCTAATATAAGTGTAGGTGGGTATCAAAACAAGACAAAT

Is there a workaround?

ADD REPLYlink written 5 months ago by jsl20

@Haci is only referring to the file name format. As posted this is normal fastq format. But since reads are probably dumped from SRA without -F option the fastq header has been modified to contain that SRR number. Your best option is to re-extract the data from SRA file with -F option or try to get the fastq files from ENA.

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax87k

I don't think the -F option would be an issue, as it only effects the sequene/read name. --split-files, on the other hand, is critical, a typical 10x run has 2 or 3 fastq outputs, all of which are expected by cellranger count with the "right" filename (not read name) conventions.

ADD REPLYlink written 5 months ago by Haci370

@genomax Indeed I downloaded from ENA using

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR852/007/SRR8526547/SRR8526547_1.fastq.gz

The header is the same. Do you recommend other ways of downloading so that the header is preserved?

@haci After changing the name to

SRR8526547_S1_L001_R1_001.fastq SRR8526547_S1_L001_R2_001.fastq

cellranger count --id=cellranger \
                 --transcriptome=/home/jl24/scratch60/refdata-cellranger-GRCh38-1.2.0/ \
                 --fastqs=.\
                 --sample= SRR8526547 \

this command did not produced any error nor output file, just this:

/gpfs/ycga/apps/hpc/software/cellranger/3.1.0/cellranger-cs/3.1.0/bin
cellranger count (3.1.0)
Copyright (c) 2019 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Usage:
    count
        --id=ID
        [--fastqs=PATH]
        [--sample=PREFIX]
        --transcriptome=DIR
        [options]
    count <run_id> <mro> [options]
    count -h | --help | --version

Do you think the issue is the header?

ADD REPLYlink modified 5 months ago • written 5 months ago by jsl20
1

Do you have a space in --sample= SRR8526547 after the =? If so remove that. That directive is not needed if you have one sample. So you could try omitting it.

ADD REPLYlink written 5 months ago by genomax87k

As far as I can tell, the pipeline did not start either. One thing you can check is the extension, cellranger count would expect fastq.gz, just like your original files. If that would be the error, the software would have complained with an error though!

ADD REPLYlink written 5 months ago by Haci370

I used fastq-dump --split-files to download srr, it gives me three files, (with size 1.2G, 174Mb, 390Mb), how do I know which file is which lane or left or right to rename the files to run cellranger? If I already downloaded some files without split-files, can I still use them? or I have to redownload them?

ADD REPLYlink written 6 weeks ago by alan0

If this is 10x data then one of the smaller files (should be re-named R1) will contain cell barcodes+UMI. Other small file should have Illumina indexes (should be re-named I1). Final file should have the actual read data (largest, should be re-named as R2).

If you post the SRR# I can take a look. Sometimes these files are included in original/additional downloads without a need to figure out what is what.

ADD REPLYlink written 6 weeks ago by genomax87k

@genomax, technically, R2 does not need to be the "largest" file. If longer read length is specified for R1 during the sequencing run, exceeding the cell and transcript barcodes and into the transcript, R1 can be equal to or larger than R2.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Haci370

Fair point. I based my comment on file sizes posted by the @alan, which seem to fit normal pattern.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax87k
0
gravatar for colindaven
5 months ago by
colindaven2.3k
Hannover Medical School
colindaven2.3k wrote:

Part of my cellranger count SLURM script, you should get the gist.

I think your problem is you're not setting the PATH to the fastqs

fq_path=/ngsssd1/rcug/N1737_N1744/200224_NB551160_0261_AHJC2TBGX9/HJC2TBGX9/outs/fastq_path
echo "Input fastq path " $fq_path


# Add miniconda3 to PATH
. /mnt/ngsnfs/tools/miniconda3/etc/profile.d/conda.sh

# Activate env on cluster node
conda activate

### Run command
# Remember to check specified a) refseq b) threads
##########

#transcriptome=/lager2/rcug/seqres/cellranger/refdata-cellranger-GRCh38-3.0.0
transcriptome=/lager2/rcug/seqres/cellranger/refdata-cellranger-mm10-3.0.0

cellranger count --id=$1 \
                 --transcriptome=$transcriptome \
                 --fastqs=$fq_path \
                 --sample=$1 \
                 --expect-cells=5000 \
                 --localcores=28
ADD COMMENTlink written 5 months ago by colindaven2.3k
0
gravatar for jsl
5 months ago by
jsl20
jsl20 wrote:

Dear all,

Thank you for your contributions. Finally gotten it to work - the codes below work fine. Turns out it was the space after --sample= as genomax astutely pointed out. Hoci also made a great point about naming of the samples which must be strictly adhered.

cellranger count --id=output \
                 --transcriptome=/home/jl2/scratch60/refdata-cellranger-GRCh38-3.0.0/ \
                 --fastqs=. \
                 --sample=SRR8526547 \

For those who might be wondering, fastq or fastq.gz will work just fine. If you are at working directory, --fastqs=. would also work. (So far, the header of my fastq files had not produced any errors, but I'll keep updated on the output.)

Thanks a lot guys.

ADD COMMENTlink modified 5 months ago • written 5 months ago by jsl20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 634 users visited in the last hour