SRAtoolkit --split-files output
1
0
Entering edit mode
8 weeks ago
tony_88888 • 0

Hi,

I've been using sratoolkit for a while now but still get confused by the output at times. For example, I am trying to download the accession SRR12386358. This is paired end data and looking at the 'data access' tab it looks like they have deposited the data correctly with fastqs for read 1 and 2

link to accession: https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&page_size=10&acc=SRR12386358&display=data-access

When I use fastq-dump --split-files --readids I get 3 files output. Please see headers in the picture attached. Could someone please explain to me which each one of these files is? Is there a way to get these files in a format that I can use for cellranger? I have tried --split-3 but the output is a single file and similar output using fasterq-dump.

enter image description here

sratoolkit sra fastq • 1.0k views
ADD COMMENT
2
Entering edit mode
8 weeks ago
GenoMax 148k

_1 file is the Illumina barcode for the sample, which is not used by cellranger.
_2 file is the Cellbarcode + UMI
_3 is the RNA read

If you had used -F with normal fastq-dump you would have removed @SRR part and ended up with normal Illumina fastq headers.

ADD COMMENT
0
Entering edit mode

Thank you very much for the response.

Do you have any idea why I got that output and not the expected read 1 and read 2? Can these files still be used as they are in cellranger or do I need to try and download them again with sratoolkit?

ADD REPLY
0
Entering edit mode

You can use files 2 and 3 with cell ranger.

ADD REPLY
0
Entering edit mode

Thanks GenoMax, I imagine this problem may occur again with future accessions.

I have some understanding of reading fastq headers but I've never came across telling the difference between the Illumina barcode for the sample and the Cellbarcode + UMI. Could you please explain how to tell the difference between these 2?

When entering these into cellranger should I put _3 (rna read) as read 1 and _2 (cellbarcode +UMI) as read 2 or does it not really matter?

Thank you very much

ADD REPLY
0
Entering edit mode

Read 1 is standard Illumina index. It will be short like the 8 bases here. Illumina indexes are not used by cellranger. They are only used for demultiplexing. Read 2 is the Cellbarcode + UMI since depending on type of kit it is read as 26 or 28 bp. It could also be the same length as RNA read in some submissions. cellranger will use the right number of bases required (26 or 28).

ADD REPLY
0
Entering edit mode

When I try and run this on cellranger I get the following error message:

[error] pipestance failed: Error log at: SRR12386358/SC_RNA_COUNTER_CS/SC_MULTI_CORE/MULTI_CHEMISTRY_DETECTOR/DETECT_COUNT_CHEMISTRY/fork0/chnk0-u86d2a17bf/_errors

Log message: FASTQ header mismatch detected at line 4 of input files "SRR12386358_S1_L001_R1_001.fastq" and "SRR12386358_S1_L001_R2_001.fastq", line: 4

This is using _2 and _3

Any idea why that may be occurring GenoMax

Thanks

ADD REPLY
0
Entering edit mode

FASTQ header mismatch detected at line 4 of input files

Did you do something to the files e.g. scan/trim them independently? If not, it is possible that your files are out of sync and/or corrupt. You can try repair.sh from BBMap suite to bring them back in sync. Or redownload.

Edit: Tested a small sample of reads from your accession and things worked without issues with cellranger.

ADD REPLY
0
Entering edit mode

I did not do anything to the files. I corrected this problem by using -F with fastq dump. When I took a look at line 4 of the input files the SRR IDs were slightly different. All good now but thanks for getting back to me.

ADD REPLY

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6