Question

How to append R1(Barcode+UMI) to header of R2(Read data)

0

Entering edit mode

5.9 years ago

sambunga094 • 0

Hi,

I am new to Bioinformatics, I have 10x Mouse 1k single cell brain dataset. How do i move the Barcode(16bp) + UMI(10bp) to header of R2 ? And also is this the right way to format a fastq file?

PS: Im just learning how to process single-cell data. Please let me know if there are any tools available.

Thank you so much!!

RNA-Seq sequencing Assembly genome alignment • 3.0k views

ADD COMMENT • link 5.9 years ago by sambunga094 • 0

0

Entering edit mode

Why are you trying to format a FASTQ file? For standard 10x workflow, you should not have to do that.

ADD REPLY • link 5.9 years ago by igor 13k

0

Entering edit mode

thanks for your reply. I want to process it without using cellranger.

ADD REPLY • link 5.9 years ago by sambunga094 • 0

2

Entering edit mode

I strongly recommend against the use of non-standard tools or even custom scripts for such a non-trivial task as UMI deduplication and quantification of single-cell data. To my knowledge all tested tools work directly on fastq files, such as CellRanger, alevin or the recent kallisto/bustools. Do yourself a favor and use them. Especially if you are new, single-cell data are not trivial to analyze and (no offense) if you are already stuck at file header manipulation, things will get very tricky downstream. I suggest you look at alevin to do the lowlevel processing. https://salmon.readthedocs.io/en/latest/alevin.html

ADD REPLY • link 5.9 years ago by ATpoint 88k

0

Entering edit mode

I want to process it without using cellranger.

Why?

The reason I ask is because I frequently see people invent a complex protocol to solve a problem that already has a relatively simple solution.

ADD REPLY • link 5.9 years ago by igor 13k

0

Entering edit mode

Cell Ranger is not a "simple solution" in the sense that it requires large amounts of RAM, large amounts of temporary disk, and takes a very long time to process standard datasets. For example, in benchmarks we performed recently (https://www.biorxiv.org/content/10.1101/673285v2) we found that on the 10x hgmm10k_v3 dataset Cell Ranger required 28Gb of RAM, 1.3Tb of disk, and took 21.5 hours to run. In comparison, kallisto | bustools required 11Gb of RAM, 15Gb of disk, and 27 minutes. The differences have real implications in terms of cost (e.g. if one is processing on AWS). Furthermore, the speed of kallisto | bustools makes it possible to rerun analyses (e.g. with updated transcriptomes) thus making a workflow that is reproducible in practice and not just in theory.

ADD REPLY • link 5.9 years ago by Lior Pachter ▴ 720

0

Entering edit mode

Cell Ranger is not a "simple solution"

Depends on how you define the Kolmogorov complexity of the solution.

ADD REPLY • link 5.9 years ago by igor 13k