Question

FastQ format error when uploading to the 10x cloud analysis site

0

Entering edit mode

12 months ago

ctenosaga • 0

Hi, I am trying to use the 10x genomics cloud analysis tool (https://cloud.10xgenomics.com) to analyze a published single nucleus RNAseq dataset. I am getting an error when trying to upload the files that seems to be related to the formatting of the fastq files themselves, but am not sure how to solve it. Here's what I am doing:

Pull data for a single sample from GEO:

$ fastq-dump --split-files --gzip SRR12623869

This produces two files, SRR12623869_1.fastq.gz and SRR12623869_2.fastq.gz. I renamed them to fit illumina format SRR12623869_S1_L001_R1_001.fastq.gz and SRR12623869_S1_L001_R2_001.fastq.gz

Then, I use txg to upload them to 10x cloud analysis, for example:

$ ./txg fastqs upload --project-id <myProjectId> ~/pathToFolderContainingFastqFiles/

This produces the error message:

target "SRR12623869_S1_L001_R1_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)
target "SRR12623869_S1_L001_R2_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)

The top of the unzipped files is formatted like this:

$ head -n 20 SRR12623869_S1_L001_R1_001.fastq
@SRR12623869.1 1 length=26
GGTTGTAGTTGCCAATCCATTGCGTA
+SRR12623869.1 1 length=26
FFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR12623869.2 2 length=26
GAGGGTATCACTCACCTCCTTCTTAG
+SRR12623869.2 2 length=26
FFFFFFFFFFFFFFFFFFF:FFFFFF
@SRR12623869.3 3 length=26
ATCCATTGTATTTCGGGATCACATGC
+SRR12623869.3 3 length=26
FFFFFFFFFFFFF:FFFFFFFFFFFF
@SRR12623869.4 4 length=26
CTCCATGTCGTCCTTGTTAGTTGTCA
+SRR12623869.4 4 length=26
FFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR12623869.5 5 length=26
ATTACCTTCGAGTACTATAACTTCCC
+SRR12623869.5 5 length=26
FFFFFFFFFFFFFFFFFFFFFFFFFF

Anyone have a suggestion on how to address this? Thank you!

fastq cellranger • 1.3k views

ADD COMMENT • link updated 12 months ago by GenoMax 142k • written 12 months ago by ctenosaga • 0

score 1 · Answer 1 · 2023-04-29

1

Entering edit mode

12 months ago

ATpoint 82k

You need the original headers of the fastq file it seems. CellRanger and 10x tools are known to be very (overly) picky with these sorts of things. Do this: ncbi sra toolkit, how to modify the fastq format? Need filter information

ADD COMMENT • link 12 months ago by ATpoint 82k

0

Entering edit mode

Thank you for the suggestion. I am still getting the same error after downloading with the --origfmt option. Maybe this particular dataset was uploaded without flowcell IDs?

$ fastq-dump --split-files --origfmt --gzip SRR12623869
$ mv SRR12623869_1.fastq.gz SRR12623869_S1_L001_R1_001.fastq.gz
$ mv SRR12623869_2.fastq.gz SRR12623869_S1_L001_R2_001.fastq.gz
$ ./txg fastqs upload --project-id <myProjectId> ~/pathToFastqFiles/
There was a problem preparing your files for upload.

target "SRR12623869_S1_L001_R1_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)
target "SRR12623869_S1_L001_R2_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)
$ gunzip SRR12623869_S1_L001_R1_001.fastq.gz
$ head -n 20 SRR12623869_S1_L001_R1_001.fastq
@1
GGTTGTAGTTGCCAATCCATTGCGTA
+1
FFFFFFFFFFFFFFFFFFFFFFFFFF
@2
GAGGGTATCACTCACCTCCTTCTTAG
+2
FFFFFFFFFFFFFFFFFFF:FFFFFF
@3
ATCCATTGTATTTCGGGATCACATGC
+3
FFFFFFFFFFFFF:FFFFFFFFFFFF
@4
CTCCATGTCGTCCTTGTTAGTTGTCA
+4
FFFFFFFFFFFFFFFFFFFFFFFFFF
@5
ATTACCTTCGAGTACTATAACTTCCC
+5
FFFFFFFFFFFFFFFFFFFFFFFFFF

ADD REPLY • link 12 months ago by ctenosaga • 0

0

Entering edit mode

I very recently encountered the exact same problem while attempting to upload to the 10X cloud, using a totally different dataset. I could also not find a way around it. However, when I ran cell-ranger counts locally on the fastqs (without 10X cloud), the program completed just fine and the output matrices/web summary look reasonable to me. I am guessing the problem is specific to the file validation process during the 10X cloud upload. Not a specific solution for you, but I hope this is somewhat helpful.

ADD REPLY • link 12 months ago by JB • 0

0

Entering edit mode

could not parse flowcell ID

The message seems to be specific. Looks like it wants a flowcell ID to be present. You could try creating a fake one. But then it may actually want the full header. Unfortunately it appears that submitter's did not submit original Illumina headers for this data.

ADD REPLY • link 12 months ago by GenoMax 142k