SRA-tools retrieved data of different size compared to ncbi website
1
0
Entering edit mode
7 weeks ago
Angelina_G • 0

Hello, I was trying to obtain scRNA-seq raw data from SRR11181957: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR11181957&display=data-access

The fastq file should be R1 of 101.2G and R2 of 271.2G.

I tried to get the fastq files directly from ebi, but there is only one fastq.gz file of ~400G, so I'm not sure what it is and how to utilize it.

I then tried SRA tools.

Firstly I used wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR11181957/SRR11181957 to and get a file ‘SRR11181957’ of size 93.0G (however, the ncbi website indicates it should be 86.6G)

Then I added .sra to the file and did (originally tried to get it directly via fasterq-dump SRR11181957 but got errors so I resorted to doing it locally):

export PATH=/sratoolkit.3.0.0-ubuntu64/bin
fasterq-dump --split-3 /SRR11181957.sra


Now I have a file SRR11181957_1.fastq of 173.9G and a SRR11181957_2.fastq of 358.8G.

I'm not sure why my files have much larger size and whether they are of the correct size?

fasterq-dump fastq sra-tools sratoolkit sra • 211 views
2
Entering edit mode
7 weeks ago
GenoMax 123k

Never use file sizes as a metric for anything other than for qualitative purpose (e.g. a file is non-zero bytes so must have something).

That said the files you have should be correct. _1 file generally is cell barcodes + UMI (26 bp) where as the _2 is actual RNA reads (100 bp). Your files are uncompressed so they appear to be larger in size. Please verify that the two files you have contain reads of the length mentioned above.

0
Entering edit mode

Thank you! The files worked fine in cellranger and I cross-checked a few barcode with the filtered_feature_bc_matrix.h5 provided by the paper author, and they correspond. Although it was weird that I got 7.7k cells in my filtered_feature_bc_matrix.h5 but theirs contain only 3k cells... Anyways, good to know I work correctly on the SRA file part!