Question: Question about the type of SRA data (SRR9713131, single- or pair-end?)
0
gravatar for tujuchuanli
8 weeks ago by
tujuchuanli60
tujuchuanli60 wrote:

Hi,

I am planning to re-analyze the scRNA-seq data on SRA (SRR9713131, https://www.ncbi.nlm.nih.gov/sra/?term=SRR9713131). According to the description of this data it is pair-end data and the first end contain sequence of UMI. However, I found the sequence of this data is single end that in “Reads” sheet of “Run Browser” page. I downloaded it (sra-format file) and converted it into fastq-format using SRAtoolkit and confirmed it indeed is single-end.

My questions are below:

  1. Is it really a single-end data instead of pair-end data? Did I make some mistake?

  2. How and why does this data pass the quality check if it is single-end, since single-end of scRNA-seq is no use?

scrna-seq • 207 views
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by tujuchuanli60
0
gravatar for ale_abd
8 weeks ago by
ale_abd30
ale_abd30 wrote:

Hello,

SRA stores paired end data on one single .sra file, thus, when you download them via SRA files you have to specify that you want the reads to be split:

fastq-dump --split-files SRR9713131

If you see this link ("Data access" tab), you can also see that the uploaded raw data is paired end and not single.

Hope this helps!

ADD COMMENTlink written 8 weeks ago by ale_abd30

Hi, ale_abd,

Thank you for your replying.

Actually, I tried to convert it from SRA-format to fastq-format using fastq-dump with --split-files. However, it came out to be single-end. Next, if you click the “Reads” tab on page https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713131. You will find the first 10 reads in this data. Then you can compared it with other data in the same dataset like https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713123#. You will find the difference.

However, I found that it seem to be permited to download raw fastq file in the "data acess" tab as you mentioned. I am trying to download it. I feel the difference may be there are three files in this data not two as the other data in the same dataset.

Do you agree with me after trying them?

Thanks

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by tujuchuanli60

Hi,

I agree with you, In the sample that you have mentioned, there are three files. Although you can find other runs from the same project with only two files, sometimes this third file is or unpaired data or it can also be the index (some old related posts: here and here). If you see the third file, all the sequences are 8 bases length and they match with the index, so you can just discard that file...

Cheers.

ADD REPLYlink written 8 weeks ago by ale_abd30

This is data from 10x so you would need to download all three fastq files from data access tab (Index reads are present in a separate file when 10x cellranger software does demux). Use the original format data links rather than SRA ones.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax80k

Do you mean that I need all of three files to run "cellranger count"?

The other data in the same dataset, such as "https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR9713123#", contains only two files and the running of cellranger count is just fine using these two files.

Why I use all the three files for this data?

The third file contain only index, it is already added onto the header of each reads in the other two files. I think it is uneless. Is it right?

ADD REPLYlink written 7 weeks ago by tujuchuanli60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 970 users visited in the last hour