Question: Paired-end SRA experiment, two samples come out as single-end
0
gravatar for xarielle
16 months ago by
xarielle0
xarielle0 wrote:

Hi, I am downloading raw RNA-seq data from SRA using fastq-dump from the SRA toolkit. I am using the --split-3 option, so that when it is single-end, I get a single fastq file per sample, and two fastq files if paired-end. It seems to be working fine, except that for a few runs from paired-end experiments, I am getting a single fastq file instead of two. An example is for the GSE75440 dataset, for sample GFP rep2 and GFP rep3 (SRR2969254 and SRR2969255). What could explain this behaviour? Thank you.

EDIT: One thing I should add is that when running fastq-dump, I get the following error:

Rejected 52180955 READS because of filtering out non-biological READS
Read 52180955 spots for SRR2969254.sra
Written 52180955 spots for SRR2969254.sra

Is the single fastq file produced usable?

rna-seq fastq-dump sra • 878 views
ADD COMMENTlink modified 16 months ago by Istvan Albert ♦♦ 80k • written 16 months ago by xarielle0
1

When possible get fastq files directly from EBI-ENA.

Even though SRR2969254 and SRR2969255 are marked as PE there appears to be only one read in ENA as well. So there could be something wrong with these two submissions.

ADD REPLYlink modified 16 months ago • written 16 months ago by genomax68k

I am getting the same error. Did you find the solution?

ADD REPLYlink written 6 months ago by rahulgenexpress10
2
gravatar for Istvan Albert
16 months ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

I would use the --split-files option instead, see the fastq-dump help page

fastq-dump -h

among the many options

...
--split-files       Dump each read into separate file.Files 
                    will receive suffix corresponding to read 
                    number 
...

If it comes out as single end it means it is mislabeled. The two may be the same though I don't like the --split-3 options as it seems like a mislabeling of sorts. Most data are not in three files.

ADD COMMENTlink modified 16 months ago • written 16 months ago by Istvan Albert ♦♦ 80k

I have seen that page, and I am aware of the --split-files option. --split-3 is what I actually want to use since it automatically splits the reads into two fastq files, denoted with "_1" and "_2" if paired-end, and outputs a single fastq file if single-end. It usually works perfectly for me. The exception is the specific samples that I have mentioned. I want to know what is the problem with these samples. From what you are saying, these specific samples would be mislabeled and in fact be paired-end, inside an experiment where all the other samples are paired-end?

ADD REPLYlink modified 16 months ago • written 16 months ago by xarielle0

Indeed it seems I have slightly misread your original post.

I think if it comes out as single-end when it is supposed to be paired-end then it might be an issue of incorrect data entry. I would also check the SRA browser as well for these datasets.

ADD REPLYlink written 16 months ago by Istvan Albert ♦♦ 80k

Yes I suppose it could be incorrect data entry, although I am not sure. Here is the link to one of these samples on the SRA browser: https://www.ncbi.nlm.nih.gov/sra/?term=SRR2969254. I am not sure how to tell what is the issue from this page.

ADD REPLYlink modified 16 months ago • written 16 months ago by xarielle0

Also please see the edit to my original post, could this error explain why there is only one file produced?

ADD REPLYlink written 16 months ago by xarielle0
1

The SRA browser for SRR2969254 shows single end reads (in the read navigation) but the run is, as you stated, labeled as PAIRED.

It might be a data entry error.

ADD REPLYlink written 16 months ago by Istvan Albert ♦♦ 80k

I think a difference beetween --split-files and --split-3 is also that when using --split-files, if single-end the file will be named accession_1.fastq, whereas it will be labeled accession.fastq if using --split-3, which I find more appropriate, although that is not that relevant.

ADD REPLYlink written 16 months ago by xarielle0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 640 users visited in the last hour