Question

Why does R1 read only have length of 26 with 10X library prepration?

1

Entering edit mode

9 months ago

yuw926 ▴ 10

I often get fastq files at the beginning of my workflow, but I'm wondering why R1 only has a length of 26 not the same as 98 or 150 in R2.

I get it Cellbarcode + UMI = 26/28 from 10X library prep. However, in a paired-end setting, should the length of R1 be the same as R2?

I guess the R1 is trimmed from the early pipeline, but why? I read the paired-end reads have a higher chance to map to the ref genome, doesn't the trimming induce some power loss?

Could someone provide some insight into this?

Also, is it really someone would set the sequence cycle to 98 instead of 100? Do I miss something here?

Thanks

Illumina 10X • 974 views

ADD COMMENT • link updated 9 months ago by dsull ★ 5.9k • written 9 months ago by yuw926 ▴ 10

1

Entering edit mode

I will address things that were not directly answered in answer below.

However, in a paired-end setting, should the length of R1 be the same as R2?

No. There is no requirement that the length of main reads (or for that matter index reads) be identical. You just need to make sure that the total length of all cycles (sequence + index) is not more than the rated capacity of sequencing kit being used.

I guess the R1 is trimmed from the early pipeline,

No not if you set the run up as a 26 bp read 1.

I read the paired-end reads have a higher chance to map to the ref genome, doesn't the trimming induce some power loss?

Not necessarily. Paired-end reads give us an anchor on the genome so the insert size can be discerned.

is it really someone would set the sequence cycle to 98 instead of 100?

Yes one can. You do not have to use all sequencing cycles provided by a sequencing kit.

ADD REPLY • link 9 months ago by GenoMax 141k

0

Entering edit mode

Wow, these comments are very insightful and helpful.

No not if you set the run up as a 26 bp read 1.

Oh, I see. Do you know if this is done in practice by chance?

Not necessarily. Paired-end reads give us an anchor on the genome so the insert size can be discerned.

I agree the insert size is definitely a concern. Here is the article I refer to figure.

ADD REPLY • link 9 months ago by yuw926 ▴ 10

1

Entering edit mode

Do you know if this is done in practice by chance?

This is done deliberately for 10x samples since 10x recommends that the samples be sequenced like this.

Since you are capturing a relatively small fragment in single cell technologies there is no precedent to do paired-end sequencing. So the insert size does not come into play compared to bulk RNAseq.

ADD REPLY • link 9 months ago by GenoMax 141k

1

Entering edit mode

As a general aside, there are many benefits to paired-end sequencing aside from insert size (which isn't really used in many RNAseq mapping tools).

The benefits to paired-end sequencing are to capture more information which can help with alignment and also helps with transcript isoform resolution. It's better than making single-end read longer because you get more coverage over the length of a gene. We see these benefits with the paired-endness of certain single cell umi protocols like smart-seq3.

ADD REPLY • link 9 months ago by dsull ★ 5.9k

score 2 · Answer 1 · 2023-07-20

2

Entering edit mode

9 months ago

colindaven 6.4k

R1 is only the cell barcode + UMI, as you state. There is no genomic information in there. That is present in R2. So you can only map single ended reads with this library design.

ADD COMMENT • link 9 months ago by colindaven 6.4k