FastQ format error when uploading to the 10x cloud analysis site
1
0
Entering edit mode
12 months ago
ctenosaga • 0

Hi, I am trying to use the 10x genomics cloud analysis tool (https://cloud.10xgenomics.com) to analyze a published single nucleus RNAseq dataset. I am getting an error when trying to upload the files that seems to be related to the formatting of the fastq files themselves, but am not sure how to solve it. Here's what I am doing:

Pull data for a single sample from GEO:

$ fastq-dump --split-files --gzip SRR12623869

This produces two files, SRR12623869_1.fastq.gz and SRR12623869_2.fastq.gz. I renamed them to fit illumina format SRR12623869_S1_L001_R1_001.fastq.gz and SRR12623869_S1_L001_R2_001.fastq.gz

Then, I use txg to upload them to 10x cloud analysis, for example:

$ ./txg fastqs upload --project-id <myProjectId> ~/pathToFolderContainingFastqFiles/

This produces the error message:

target "SRR12623869_S1_L001_R1_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)
target "SRR12623869_S1_L001_R2_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)

The top of the unzipped files is formatted like this:

$ head -n 20 SRR12623869_S1_L001_R1_001.fastq
@SRR12623869.1 1 length=26
GGTTGTAGTTGCCAATCCATTGCGTA
+SRR12623869.1 1 length=26
FFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR12623869.2 2 length=26
GAGGGTATCACTCACCTCCTTCTTAG
+SRR12623869.2 2 length=26
FFFFFFFFFFFFFFFFFFF:FFFFFF
@SRR12623869.3 3 length=26
ATCCATTGTATTTCGGGATCACATGC
+SRR12623869.3 3 length=26
FFFFFFFFFFFFF:FFFFFFFFFFFF
@SRR12623869.4 4 length=26
CTCCATGTCGTCCTTGTTAGTTGTCA
+SRR12623869.4 4 length=26
FFFFFFFFFFFFFFFFFFFFFFFFFF
@SRR12623869.5 5 length=26
ATTACCTTCGAGTACTATAACTTCCC
+SRR12623869.5 5 length=26
FFFFFFFFFFFFFFFFFFFFFFFFFF

Anyone have a suggestion on how to address this? Thank you!

fastq cellranger • 1.3k views
ADD COMMENT
1
Entering edit mode
12 months ago
ATpoint 82k

You need the original headers of the fastq file it seems. CellRanger and 10x tools are known to be very (overly) picky with these sorts of things. Do this: ncbi sra toolkit, how to modify the fastq format? Need filter information

ADD COMMENT
0
Entering edit mode

Thank you for the suggestion. I am still getting the same error after downloading with the --origfmt option. Maybe this particular dataset was uploaded without flowcell IDs?

$ fastq-dump --split-files --origfmt --gzip SRR12623869
$ mv SRR12623869_1.fastq.gz SRR12623869_S1_L001_R1_001.fastq.gz
$ mv SRR12623869_2.fastq.gz SRR12623869_S1_L001_R2_001.fastq.gz
$ ./txg fastqs upload --project-id <myProjectId> ~/pathToFastqFiles/
There was a problem preparing your files for upload.

target "SRR12623869_S1_L001_R1_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)
target "SRR12623869_S1_L001_R2_001.fastq.gz" is not a valid FASTQ file (could not parse flowcell ID: not a valid fastq)
$ gunzip SRR12623869_S1_L001_R1_001.fastq.gz
$ head -n 20 SRR12623869_S1_L001_R1_001.fastq
@1
GGTTGTAGTTGCCAATCCATTGCGTA
+1
FFFFFFFFFFFFFFFFFFFFFFFFFF
@2
GAGGGTATCACTCACCTCCTTCTTAG
+2
FFFFFFFFFFFFFFFFFFF:FFFFFF
@3
ATCCATTGTATTTCGGGATCACATGC
+3
FFFFFFFFFFFFF:FFFFFFFFFFFF
@4
CTCCATGTCGTCCTTGTTAGTTGTCA
+4
FFFFFFFFFFFFFFFFFFFFFFFFFF
@5
ATTACCTTCGAGTACTATAACTTCCC
+5
FFFFFFFFFFFFFFFFFFFFFFFFFF
ADD REPLY
0
Entering edit mode

I very recently encountered the exact same problem while attempting to upload to the 10X cloud, using a totally different dataset. I could also not find a way around it. However, when I ran cell-ranger counts locally on the fastqs (without 10X cloud), the program completed just fine and the output matrices/web summary look reasonable to me. I am guessing the problem is specific to the file validation process during the 10X cloud upload. Not a specific solution for you, but I hope this is somewhat helpful.

ADD REPLY
0
Entering edit mode

could not parse flowcell ID

The message seems to be specific. Looks like it wants a flowcell ID to be present. You could try creating a fake one. But then it may actually want the full header. Unfortunately it appears that submitter's did not submit original Illumina headers for this data.

ADD REPLY

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6