Hey!
I am new to processing snRNAseq data and experiencing working with 10X cellranger pipeline for the first time. I encountered the following error:
[error] Pipestance failed. Error log at:
OSCC11/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-uc6a6c33e8e/_errors
Log message:
FASTQ header mismatch detected at line 4 of input files...
I downloaded my sample data from: https://www.ncbi.nlm.nih.gov/sra/SRX14334802[accn]
For each of the SRRs related to my sample (SRR18187852, SRR18187853, SRR18187882, SRR18187893) I used this commands:
fasterq-dump --outdir "$output_directory" "$SRR_ID"
gzip -c <file> > <new_name>.gz
I renamed my files to match the requirements:
OSCC11_S1_L001_R1_001.fastq.gz OSCC11_S1_L001_R2_001.fastq.gz
OSCC11_S1_L002_R1_001.fastq.gz OSCC11_S1_L002_R2_001.fastq.gz
I ran cellranger 7.2.0 with the command:
cellranger count --id="$ID" \
--transcriptome="$TRANSCRIPTOME" \
--fastqs="$INPUT_DIR" \
--sample="$SAMPLE_NAME"\
--include-introns true \
--localcores=8 \
--localmem=62\
I used refdata-gex-GRCh38-2020-A
as the TRANSCRIPTOME and gave the directory of the 4 fastq.gz files I mentioned as the INPUT_DIR
.
I tried running this command with different chemistry flags (SC5P-R2, fiveprime, SC5P-PE) and yet the issue still arises.
It seems like the data I'm working with is divided to different SRR for the different reads R1 and R2 and I'm guessing this might be the cause.
Is there a way to run cellranger on a dataset of this format?
Thank for the quick reply! This is the output of the commands you sent:
As you can see the headers here do not match between the two files. They should be identical (part before the
length=
that is) for a pair of R1/R2 files.and
It looks like you renamed the files incorrectly after downloading them. Fix the names by looking at the reads inside the files and then you should be good to go.
I think this issue might be because of the format of my data. The 2 files are from the same sample and line but have different SRRs which probably cause the headers to be different. In this NCBI links both SRR18187882 and SRR18187893 are part of the OSCC_11_Human sample and are named 1_OSCC_11_GEX_S1_L001_R2_001 and 1_OSCC_11_GEX_S1_L001_R1_00. https://www.ncbi.nlm.nih.gov/sra/?term=SRR18187882 https://www.ncbi.nlm.nih.gov/sra/?term=SRR18187893
Par for course for the non-standard landscape of 10x data in SRA.
You could try dumping the data with
-F
option. That seems to eliminate theSRR*
in the headers leaving just@1
etc. That may work. Otherwise you may need to edit one of the files and change theSRR*
so they match.I tried editing the files and changed the SRR* and it indeed helped with the headers mismatch although I get a different error now:
Are there equal number of reads in both files?