Mate pairs and paired reads confusion
2
0
Entering edit mode
2.1 years ago

Hello!

I find myself confused about insert size and pairing of reads. How are read pairs paired? As in, how does the aligning software know that they belong together? How does the sequencing machine? And, will the aligner know the exact distance between reads in a pair, so as to build a scaffold?

Or can only mate pairs do the latter? Are mate pairs still a thing? How does the aligner know the distance between two mates?

I realize these are quite basic questions, and apologize in advance.

Sincerely, Joel

sequencing • 2.2k views
ADD COMMENT
1
Entering edit mode
2.1 years ago

Mated-pairs is a type of paired-end reads where the distance and orientation of the pairs is different.

Paired-end reads came to describe the Illumina sequencing protocol where the reads are pointing towards one another,r the read lengths are about 150 bp and the distance between ends is a few hundred base pairs:

==150==>     <==150==

|-------  400 ------|

Mated pair libraries used to mean some sort of circularization method during library preparation, where, after sequencing, the reads point in the same direction, and the distance is a few thousand base pairs.

 ==150==>                           ==150==>

|-------            2000              ------|

Note how the aligner can immediately tell what the distance and orientation of the reads pairs are and thus identify the protocol.

Mated pairs are typically used for assembly as it allows ordering more distant pieces of DNA even when the intermediate sequences are missing.

ADD COMMENT
0
Entering edit mode

Thank you Istvan for your reply! You say "Mated pair libraries used to mean...", are you implying this is no longer the case? Are they still being used?

ADD REPLY
0
Entering edit mode

I have not seen data produced with this technology for a many years now, hence I am not quite sure if it is still in use and wether the terminology is still the same.

I suspect that long-read technology like PacBio has turned mated-pairs into somewhat obsolete technology.

ADD REPLY
0
Entering edit mode

Thank you very much Istvan. Have a lovely week!

ADD REPLY
0
Entering edit mode
2.1 years ago

See @IstvanAlbert's answer for the difference between paired-end and mate-pair.

For your other question:

The sequencing machine knows pairs belong together because they reside at identical locations on the flow cell. Basically, the two ends of a fragment of DNA have different primers on them. A run of the machine is done using the read1 primer first, the flurorescence at each coordinate on the flowcell recorded at each base cycle, and then the results stripped off. The process is then repeated using the read2 primer. As read1 and read2 are reads from the same physical piece of DNA, they will be in the same location on the flowcell.

The aligner knows that two reads belong together because of the order in the fastq file. The first read in the read1 fastq is the pair of the first read in the read2 fastq, and the 600th read in the read1 fastq is the pair of the 600th read in the read2 fastq. This is why it is important not to change the order of reads in fastq files without taking account of pairing.

ADD COMMENT
0
Entering edit mode

Thank you for your helpful reply! I understand now, that was a great explanation. Thank you so much :-] Do you happen to also know how the analogous process works for mate pairs?

ADD REPLY
1
Entering edit mode

I believe that if you reverse complement the second read (before aligning) the mated-pairs will be in the same orientation as a "regular" paired-end would.

Thus workflows that need that orientation would work with it.

ADD REPLY
0
Entering edit mode

I'm afraid I don't. I've not handled mate-pair reads before, and my impression is that they have mostly been replaced by long read sequencing, but I might be wrong.

ADD REPLY
1
Entering edit mode

I suspect you are correct. Big thanks regardless, have a merry weekend!

ADD REPLY

Login before adding your answer.

Traffic: 2839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6