10X V3 library with only one fastq file
1
1
Entering edit mode
13 months ago
Julien ▴ 10

Hello,

I tried to download the library ERX5671923 from SRA using fastq-dump with the --split-files option. It is a library from the Fly Cell Atlas (experiment ERP129698 in SRA). I retrieved only one file per run (e.g ERR6032593). As it is a paired end 10X V3 library I was expecting to retrieve 2 or 3 files (Read 1, Read 2 and potentially Index 1) but it contains only one single file with 91bp reads. Do you have any idea if it is possible to use this file and, if yes, how to use it? I want to generate abundance matrix using kallisto/bustools.

Best Wishes,

Julien

scRNA-seq V3 SRA kallisto 10X • 1.7k views
ADD COMMENT
1
Entering edit mode

Hmm, from https://www.ncbi.nlm.nih.gov/sra/?term=ERX5671923 -- it seems that only one FASTQ file (the 91-bp biological sequence) is available. Unfortunately, that means the barcodes and UMI sequences are not available. Therefore, it's not possible to just use that one file with any tool. You'd need to find some way to obtain the other FASTQ file.

ADD REPLY
0
Entering edit mode

Thank you for your answer. It corresponds exactly to the answer I was afraid of. I have to check more deeply but it looks like the same issue applies to all run of all libraries from this Fly Cell Atlas experiment :'(

ADD REPLY
4
Entering edit mode
13 months ago
ATpoint 81k

That happens quite often that R1 is missing, don't ask me why. Good thing is that often submitters provide BAM files allowing reconstruction of fastq from there. That is the case here for all four accession numbers. See for example at the bottom of here.

You can conveniently get bam files with prefetch from the sra-toolkit:

mamba install -c bioconda sra-tools
prefetch --type bam --max-size 9999999999 -O ./ ERR6032593

Sometimes in Type (see below) it doesn't say bam but something like 10X Genomics bam file, for example here. Then you can use --type TenX with prefetch afaik.

enter image description here

Once you have the BAM files and it is the BAM file from CellRanger use the bam2fastq utility from 10x to convert the bam back to fastq: https://support.10xgenomics.com/docs/bamtofastq

If the BAM was made from alternative pipelines you will probably need to do custom parsing to recreate the R1 file as technically scRNA-seq (10x) is single-end sequencing using R2 while R1 is not used for the actual alignment but CB/UMI are processed differently. You probably need to access the tags that store CB and UMI sequences and recreate R1 accordingly, putting these sequences into the read positions where either CellRanger or your processing pipelines expect them. For example, 10x Chromium 3' v3 has CB in R1 position 1-16 and the UMI at 17-28, so that is relatively easy to parse from the BAM tags (I guess, untested, never done manually myself). But then again there are probably corner cases, so be careful.

See also: https://bioinformatics.stackexchange.com/a/15523

ADD COMMENT
0
Entering edit mode

I tried this approach. No problems to download the BAM file nor to install bamtofastq v1.4.1. However when running bamtofastq I have warnings :

WARNING: no @RG (read group) headers found in BAM file. Splitting data by the GEM well marked in the corrected barcode tag. Reads without a corrected barcode will not appear in output FASTQs

The BAM format is not recognized as I also have the error message :

Unrecognized 10x BAM file. For BAM files produced by older pipelines, use one of the following flags: --gemcode BAM files created with GemCode data using Longranger 1.0 - 1.3 --lr20 BAM files created with Longranger 2.0 using Chromium Genome data --cr11 BAM files created with Cell Ranger 1.0-1.1 using Single Cell 3' v1 data

As I do not have any info about the origin of this BAM file I tried with the 3 options proposed.

The only one that was recognized was --cr11 but it did not create any file. It looks like I am in the case of a BAM file coming from an alternative pipeline.

ADD REPLY

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6