Question

Only one read per run - Trying to use CellRangerv7

0

Entering edit mode

8 months ago

Sky ▴ 10

I am trying to process a dataset (https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR16053948&display=data-access) using CellRangerv7. The only problem is that there is only 1 read. From my understanding, CellRanger requires two reads. From what I can see, no BAM file is uploaded only the Fastq. Can I still process this data with CellRanger?

Fastq CellRanger • 1.4k views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 8 months ago by Sky ▴ 10

0

Entering edit mode

This is the experiment you're looking for. It has 4 FASTQ files.

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

I understand that there are 4 Fastq files but each fastq file only has one read. CellRanger seemingly requests two reads (https://kb.10xgenomics.com/hc/en-us/articles/115003802691-How-do-I-prepare-Sequence-Read-Archive-SRA-data-from-NCBI-for-Cell-Ranger).

For example, this dataset has one (https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR16053948&display=metadata)

While this one has two and an index (https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR11772848&display=metadata)

I hope this helps to clarify my question

ADD REPLY • link 8 months ago by Sky ▴ 10

0

Entering edit mode

I'm not going to download files so maybe you can show me previews here. When you say "each fastq file only has one read", do you mean each file has only 4 lines in it, because that seems impossible. SRA says they have 1.2G bases per file, so "only one read" does not make any sense to me.

ADD REPLY • link 8 months ago by Ram 44k

1

Entering edit mode

That's why I attached the links as an example. You don't have to download the data to look at the metadata and see that it says "one read per spot"

ADD REPLY • link 8 months ago by Sky ▴ 10

0

Entering edit mode

I don't think "one read per spot" means one read in total. It cannot be - the gzipped file is ~1GB in size.

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

Then I guess the question changes to how do process only one file with CellRanger. When you unzip the fastq, there is still one one file. Normally it will unzip into 2 or three so you can rename them R1, R2, I1, etc.

ADD REPLY • link 8 months ago by Sky ▴ 10

0

Entering edit mode

I think there's some serious communication gap - unzipping FQ does not yield multiple files. Can you download all 4 files from https://www.ncbi.nlm.nih.gov/sra/SRX12340615 and paste the first 12 lines of each file please?

ADD REPLY • link 8 months ago by Ram 44k

0

Entering edit mode

Sorry for the delayed reply. I have included my code and the output from downloading the four files. Each dataset only resulted in one fastq file when I am expecting -R1 and -R2 so I can input it into cellranger since cellranger requires two inputs, not one. Unless there is a way to get around that.

I did want to note that SRR16053948 and 49 are the exact same size (992.4Mb) while 50 and 51 are both 1,018.8Mb.

fastq-dump SRR16053948 --split-3 --skip-technical                                                                          
Read 12317666 spots for SRR16053948
Written 12317666 spots for SRR16053948

fastq-dump SRR16053949 --split-3 --skip-technical                                                                          
Read 12239309 spots for SRR16053949
Written 12239309 spots for SRR16053949

fastq-dump SRR16053950 --split-3 --skip-technical                                                                  
Read 12594485 spots for SRR16053950
Written 12594485 spots for SRR16053950

fastq-dump SRR16053951 --split-3 --skip-technical                                                                          
Read 12418758 spots for SRR16053951
Written 12418758 spots for SRR16053951

Output:

SRR16053948.fastq
SRR16053949.fastq
SRR16053950.fastq
SRR16053951.fastq

Normally I am used to the output being something along the lines of SRR16053948-R1 and SRR16053948-R2. The files normally split automatically so I am not sure why they are not splitting now.

ADD REPLY • link updated 7 months ago by Ram 44k • written 7 months ago by Sky ▴ 10

1

Entering edit mode

This is a comment, not an answer. Please don't post it as an answer. :)
This is sci-rna-seq, not 10x rna-seq. CellRanger would not be the correct tool to use. It looks like the authors used STAR.
They only supply one FASTQ file -- that means they probably didn't upload the other FASTQ file containing the UMIs and barcodes. It's very possible that they decided to encode that information in the header of the FASTQ file via a custom pipeline. In which case, you'll have to dig around yourself to figure out what they did or you'll have to contact them. I'm afraid processing this dataset won't be straightforward.

ADD REPLY • link 7 months ago by dsull ★ 6.9k

0

Entering edit mode

https://www.ncbi.nlm.nih.gov/sra/?term=SRR16053948

Metadata indicates that this is:

Single-cell combinatorial-indexing RNA-sequencing (sci-RNA-seq) protocol is described previously and more details can be found in this link (https://github.com/bbi-lab/). sci-RNA-seq relies on the following steps, (i) thawed nuclei were permeabilized with 0.2% TritonX-100 (Sigma, #T9284) (in nuclei wash buffer) for 3 min on ice, and briefly sonicated to reduce nuclei clumping; (ii) nuclei distributed across 96-well plates; (iii) A first molecular index is introduced to the mRNA of cells within each well, with in situ reverse transcription (RT) incorporating the unique molecular identifiers (UMIs); (iv) All cells were pooled and redistributed to multiple 96-well plates in limiting numbers (e.g., 10 to 100 per well) and a second molecular index is introduced by hairpin ligation;(v) Second strand synthesis, tagmentation, purification and indexed PCR; (vi) Library purification and sequencing is performed.

ADD REPLY • link 7 months ago by GenoMax 147k