How to append R1(Barcode+UMI) to header of R2(Read data)
0
0
Entering edit mode
4.7 years ago

Hi,

I am new to Bioinformatics, I have 10x Mouse 1k single cell brain dataset. How do i move the Barcode(16bp) + UMI(10bp) to header of R2 ? And also is this the right way to format a fastq file?

PS: Im just learning how to process single-cell data. Please let me know if there are any tools available.

Thank you so much!!

RNA-Seq sequencing Assembly genome alignment • 2.5k views
ADD COMMENT
0
Entering edit mode

Why are you trying to format a FASTQ file? For standard 10x workflow, you should not have to do that.

ADD REPLY
0
Entering edit mode

thanks for your reply. I want to process it without using cellranger.

ADD REPLY
2
Entering edit mode

I strongly recommend against the use of non-standard tools or even custom scripts for such a non-trivial task as UMI deduplication and quantification of single-cell data. To my knowledge all tested tools work directly on fastq files, such as CellRanger, alevin or the recent kallisto/bustools. Do yourself a favor and use them. Especially if you are new, single-cell data are not trivial to analyze and (no offense) if you are already stuck at file header manipulation, things will get very tricky downstream. I suggest you look at alevin to do the lowlevel processing. https://salmon.readthedocs.io/en/latest/alevin.html

ADD REPLY
0
Entering edit mode

I want to process it without using cellranger.

Why?

The reason I ask is because I frequently see people invent a complex protocol to solve a problem that already has a relatively simple solution.

ADD REPLY
0
Entering edit mode

Cell Ranger is not a "simple solution" in the sense that it requires large amounts of RAM, large amounts of temporary disk, and takes a very long time to process standard datasets. For example, in benchmarks we performed recently (https://www.biorxiv.org/content/10.1101/673285v2) we found that on the 10x hgmm10k_v3 dataset Cell Ranger required 28Gb of RAM, 1.3Tb of disk, and took 21.5 hours to run. In comparison, kallisto | bustools required 11Gb of RAM, 15Gb of disk, and 27 minutes. The differences have real implications in terms of cost (e.g. if one is processing on AWS). Furthermore, the speed of kallisto | bustools makes it possible to rerun analyses (e.g. with updated transcriptomes) thus making a workflow that is reproducible in practice and not just in theory.

ADD REPLY
0
Entering edit mode

Cell Ranger is not a "simple solution"

Depends on how you define the Kolmogorov complexity of the solution.

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6