How to fasterq-dump 10x genomics snATACseq fastq from SRA
2
0
Entering edit mode
2.6 years ago

I am trying to retrieve fastq files from a 10x genomics snATACseq dataset on SRA. Each run should have 4 fastq files associated with it:

I1: Dual index i7 read (optional)
R1: Read 1
R2: Dual index i5 read
R3: Read 2

These four fastq files were definitely uploaded to SRA (check the data access tab for one of the runs: Link to SRA) BAM files were not uploaded, so I can't use bam2fastq.

I've downloaded the .sra files, but when I try to dump them using fasterq-dump (I've tried every option) it only outputs two files: R1 and R2, which do not necessarily correspond to the R1 and R2 mentioned above.

How do I get all four fastq files from SRA?

fasterq-dump snATACseq scATACseq sra 10x • 1.6k views
ADD COMMENT
0
Entering edit mode
2.6 years ago
GenoMax 141k

With

$ fastq-dump --split-files -F  SRR11858618

I get 4 expected files. You should be able to figure how to rename them so they are R1,R2,R3,I1 after the extraction is complete.

$ more SRR11858618_*
::::::::::::::
SRR11858618_1.fastq
::::::::::::::
@A00325:101:HF727DRXX:1:1101:1208:1016
ACGGGACT
+A00325:101:HF727DRXX:1:1101:1208:1016
FFFFFFFF
@A00325:101:HF727DRXX:1:1101:2003:1016
ACGGGACT
+A00325:101:HF727DRXX:1:1101:2003:1016
FFFFFFFF
::::::::::::::
SRR11858618_2.fastq
::::::::::::::
@A00325:101:HF727DRXX:1:1101:1208:1016
TNTAAGATCAATGTTCTAAAAAAGTGACAAAACCTCAGTGTTTCTTTCCT
+A00325:101:HF727DRXX:1:1101:1208:1016
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF
@A00325:101:HF727DRXX:1:1101:2003:1016
ANTAGGAACAGTCCTTCCAACACAGATTAGGTTCATTGGGAACACATGCA
+A00325:101:HF727DRXX:1:1101:2003:1016
F#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
::::::::::::::
SRR11858618_3.fastq
::::::::::::::
@A00325:101:HF727DRXX:1:1101:1208:1016
NCTATTGTCTTAGTGG
+A00325:101:HF727DRXX:1:1101:1208:1016
#FFFFFFFF:FFFFFF
@A00325:101:HF727DRXX:1:1101:2003:1016
NAGCGCTGTTGCAGAG
+A00325:101:HF727DRXX:1:1101:2003:1016
#FFFFFFFFFFFFFFF
::::::::::::::
SRR11858618_4.fastq
::::::::::::::
@A00325:101:HF727DRXX:1:1101:1208:1016
GTGCAGGTCAGGCTCCGGTAAGGAATGCGTGAAACTCAGTTTCTAAAGG
+A00325:101:HF727DRXX:1:1101:1208:1016
FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFF
@A00325:101:HF727DRXX:1:1101:2003:1016
GTCCGTCTGTCCCAGAAGTCCCAGCTCCTTTCCTGCTCTGGCACCTCCT
+A00325:101:HF727DRXX:1:1101:2003:1016
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
ADD COMMENT
0
Entering edit mode
2.6 years ago

If the files have been demultiplexed, you should only need R1 and R2. However, you will need to change the names of the fastqs so that they look like the kinds of names that bcl2fastq gives them. 10X used to have a link to the naming definition, it looks like it's broken. Just make sure that your file names end like this:

pbmc_1k_v3_S1_L001_R1_001.fastq.gz

pbmc_1k_v3_S1_L001_R2_001.fastq.gz

Change the bold part to what you want (no funny characters, naturally), leave the rest exactly like that.

ADD COMMENT

Login before adding your answer.

Traffic: 1463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6