Question: How to append R1(Barcode+UMI) to header of R2(Read data)
gravatar for sambunga094
8 weeks ago by
sambunga0940 wrote:


I am new to Bioinformatics, I have 10x Mouse 1k single cell brain dataset. How do i move the Barcode(16bp) + UMI(10bp) to header of R2 ? And also is this the right way to format a fastq file?

PS: Im just learning how to process single-cell data. Please let me know if there are any tools available.

Thank you so much!!

ADD COMMENTlink written 8 weeks ago by sambunga0940

Why are you trying to format a FASTQ file? For standard 10x workflow, you should not have to do that.

ADD REPLYlink written 8 weeks ago by igor8.3k

thanks for your reply. I want to process it without using cellranger.

ADD REPLYlink written 8 weeks ago by sambunga0940

I strongly recommend against the use of non-standard tools or even custom scripts for such a non-trivial task as UMI deduplication and quantification of single-cell data. To my knowledge all tested tools work directly on fastq files, such as CellRanger, alevin or the recent kallisto/bustools. Do yourself a favor and use them. Especially if you are new, single-cell data are not trivial to analyze and (no offense) if you are already stuck at file header manipulation, things will get very tricky downstream. I suggest you look at alevin to do the lowlevel processing.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by ATpoint23k

I want to process it without using cellranger.


The reason I ask is because I frequently see people invent a complex protocol to solve a problem that already has a relatively simple solution.

ADD REPLYlink written 7 weeks ago by igor8.3k

Cell Ranger is not a "simple solution" in the sense that it requires large amounts of RAM, large amounts of temporary disk, and takes a very long time to process standard datasets. For example, in benchmarks we performed recently ( we found that on the 10x hgmm10k_v3 dataset Cell Ranger required 28Gb of RAM, 1.3Tb of disk, and took 21.5 hours to run. In comparison, kallisto | bustools required 11Gb of RAM, 15Gb of disk, and 27 minutes. The differences have real implications in terms of cost (e.g. if one is processing on AWS). Furthermore, the speed of kallisto | bustools makes it possible to rerun analyses (e.g. with updated transcriptomes) thus making a workflow that is reproducible in practice and not just in theory.

ADD REPLYlink written 4 weeks ago by Lior Pachter370

Cell Ranger is not a "simple solution"

Depends on how you define the Kolmogorov complexity of the solution.

ADD REPLYlink written 27 days ago by igor8.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1141 users visited in the last hour