Orientation in paired-end sequencing?
1
6
Entering edit mode
7.6 years ago
John Smith ▴ 290

I am new to bioinformatics and currently learning how to use Bowtie 2. As written in the manual:

A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align "concordantly". If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren't in the expected relative orientation, or aren't within the expected distance range, or both), the pair is said to align "discordantly".

I have read about the basics of paired-end sequencing and orientation (in the molecular biology sense). In summary, my understanding is that in paired-end sequencing we sequence both ends of a DNA fragment at the 5' end and the 3' end (we call them mate 1 and mate 2) and by knowing the expected length between the mates we can better align the fragment to a reference genome.

My question is, what is it meant by two mates having an expected relative orientation? If am using Bowtie 2 and giving it a file with all the first mates and another one with all the corresponding second mates, how can two mates align without first having the expected orientation?

RNA-Seq orientation paired-end Bowtie2 • 14k views
9
Entering edit mode
7.6 years ago
cts ★ 1.7k

In Illumina paired-end sequencing the two two mates will be sequenced in the opposite direction to one another (see my ASCII art below), so what you expect from this is that the reads should be pointing into each other, which would be the expected orientation. The reads can be in other orientations if:

1. say you were mapping to a reference from a related organism which had an inversion in its genome relative to what you sequenced;
2. The reads mapped to a palindromic region and the read mapper could not determine the correct orientation
3. There is a problem with the assembly that you are mapping to (for example errors in a de novo assembly)

------->                     Read 1
=========================    DNA fragment


Also note that other sequencing technologies produce reads in different orientations. For example, Illumina mate-pair creates reads that point away from each other rather than into each other. I don't know how bowtie handles these cases where other orientations are expected

1
Entering edit mode

The orientations are determined with regards to their 3' and 5' ends? The mapper than supposes that for the given platform the reads should align in a particular orientation, and if they map differently, states a warning / error?

Does the mapper (like bwa for instance) than check for all mapping combinations (-><-, ->->, <-<-, <-->) or does it fix the first seq from the pair and rotate the other seq?

1
Entering edit mode

Since the sequence in a fastq file is always 5'->3', the orientation is determined by the relative positions of the alignments and which, if either, is reverse complemented. If they don't align with the proper orientation then bit 0x2 in the flag field won't be set.

How the search is performed will vary by aligner. Bowtie2, for example, will search for concordant pairs first (at least if memory serves).

1
Entering edit mode

I don't think that this is the question. Mate-Pair and Paired-End are COMPLETELY different. One (Mate-Pair) is a library construction method. The other (Paired-End) is a method of sequencing. The confusion arises from Pyro/454/Roche terminology wherein they used the term "mate-pair" to describe what Illumina now calls "paired-end" sequencing. The distinction was made (by Illumina) when the technique of constructing "MATE-PAIR" libraries was introduced and when Illumina introduced sequencing from both ends "PAIRED-END".

To further clarify; on Illumina platforms, paired-end sequencing is sequenced as depicted in the illustration. The reads come in TWO files (R1 and R2). The ORIENTATION of the reads is (R1)FORWARD (R2)REVERSE. However, both files will have the reads written 5'----3' orientation.

For example, If you make an interleaved file, the R1 read would be FORWARD and the R2 read would be REVERSE.

0
Entering edit mode

To clarify what you mean by reads "pointing" to each other, take the following example:

ACAAGATGCCATTGTCCC      Read 1

ACAAGATGCCATTGTCCCCCGGCCTCCTGGCTCTCCGGGGCCACGGCCACCGCTGCCC DNA fragment

CCCGTCGCCACCGGCACC      Read 2

Are these two reads pointing to each other?

1
Entering edit mode

The second read wouldn't align to the given stretch of DNA; read1 would map as unpaired.

0
Entering edit mode

How then would two reads be pointing to each other? Would it be like in the following example?

ACAAGATGCCATTGTCCC      Read 1
ACAAGATGCCATTGTCCCCCGGCCTCCTGGCTCTCCGGGGCCACGGCCACCGCTGCCC DNA fragment


I am just having trouble understanding what "pointing to each other" means in this context.

0
Entering edit mode

If the original sequence of read1 was ACAAGATGCCATTGTCCC and that for read2 was GGGCAGCGGTGGCCGTGG, then yes.

0
Entering edit mode

How did you come up with read 2?

0
Entering edit mode

It's the reverse complement of the sequence you posted. I'm guessing that your background isn't biology :)

0
Entering edit mode

You guessed right! I just have one more question, why the complement and not sequence itself?

1
Entering edit mode

DNA is double stranded, with the opposite strand being the complement. So, if one strand is

ACAAGATGCCATTGTCCCCCGGCCTCCTGGCTCTCCGGGGCCACGGCCACCGCTGCCC + strand
TGTTCTACGGTAACAGGGGGCCGGCGGACCGAGAGGCCCCGGTGCCGGTGGCGACGGG - strand

Since sequence is always presented 5' to 3', you need the reverse complement. You'd be well served to take a few biology classes, you're life would be easier.

0
Entering edit mode
0
Entering edit mode

Hi, I have a little doubt about that. I have a dataset of Illumina paired-end strand-specific reads. I assembled it de novo using Trinity, and now I'm assessing its quality using several metrics. One of them is align the initial reads with the new contigs. For that I'm using Bowtie2; but I wonder: Given a forward read and a contig, does Bowtie2 try to align the read in both strands? I always assumed that the answer is yes, but just to be sure.