RNA-seq: How to know which paired reads come from the same original fragment?
2
1
Entering edit mode
6.4 years ago
psm ▴ 130

In paired-end RNA sequencing, can it definitively be known which two paired reads come from the same fragment, or is it inferred based on the distance between the reads?

RNA-Seq Assembly • 6.3k views
ADD COMMENT
2
Entering edit mode
6.4 years ago
Tm ★ 1.1k

When you go for pair-end library preparation from RNA fragments followed by its sequencing, it results in generation of 2 files, one read generated from forward sequencing (mostly denoted as R1) and second read is generated from reverse sequencing (mostly denoted as R2)

That means for each of your fragments generated during library preparation, there is 1 forward sequence in R1 and its corresponding reverse sequence in R2. So, all the reads generated from the sequencer are properly paired only having same read name/header.

extra

extra1

Here, you can see two reads, one from R1 file and second from R2 file. I.e they are representing same RNA fragment and thus they have same read name except part highlighted in red and blue which starts with 1 and 2 respectively, indicating 1st read is from R1 and 2nd read is from R2 file

ADD COMMENT
2
Entering edit mode

In addition to that, from a technical side: Both the forward and reverse read are detected from the exact same spot on the flow cell, so that they are assigned the same name. Check this video for details on the Illumina process.

ADD REPLY
0
Entering edit mode

Thank you - this is exactly what I wanted to know.

ADD REPLY
0
Entering edit mode

You're very welcome ;-)

ADD REPLY
0
Entering edit mode

Hey, may I ask a question following your answere? Should these two reads be reverse complementary with each other? But I didn't see they are reverse complementary. Could you help me with this?

ADD REPLY
0
Entering edit mode
6.4 years ago

You tell from the shared read name.

ADD COMMENT
1
Entering edit mode

Thanks for the answer. To clarify my (perhaps poorly-worded) question, how is it determined which reads are paired? Through a bioinformatics algorithm, or perhaps through some feature of the adaptor/ flowcell? Is it possible that two reads are incorrectly paired? Apologies for my ignorance - I had trouble finding the answer elsewhere.

ADD REPLY
1
Entering edit mode

The instrument knows the xy coordinates of every cluster. Reads from the same xy coordinates are paired. If two clusters overlap such that the software can't distinguish them, it will throw them out. So no, the software can't mix up read pairs. And of course, the software knows the reads are paired long before any mapping coordinates are known.

ADD REPLY
0
Entering edit mode

Thank you - this also helped clarify things for me.

ADD REPLY

Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6