Paired-end amplicon sequencing data
0
0
Entering edit mode
6.4 years ago
bioplanet ▴ 60

Hi all,

this question is addressed to those of you that have worked with paired-end amplicon sequencing data from Illumina. So, what I have is a construct of say 400bp that has been sequencing from both sides, with reads of 250 bp each. My question is, when I analyze the data I get, do I regard each pair of reads as one? For example, in the following mapping (created using STAR aligner with the paired-end option):

TEST_ID     163     reference        1       255     1S249M  =       178      
TEST_ID     83      reference        178     255     250M    =       1

you can see that the first read's left-most mapping position was 1 (the "left" side) and the other read of the pair starts mapping at position 178. Obviously these reads overlap in some regions. So, for example, if I wanted to retrieve the sequence of the read TEST_ID for positions 200-220, should I consider both of them separately (i.e. treat them as belonging to distinct reads) if they differ or only use the whole pair whatsoever if the two sequences I get agree with each other?

sequencing paired-end • 2.6k views
ADD COMMENT
0
Entering edit mode

when I analyze the data I get, do I regard each pair of reads as one?

it's hard to answer if we don't know the type of analysis you're trying to do...

ADD REPLY
0
Entering edit mode

I wonder if STAR is an appropriate choice for aligning amplicons. Unless you are doing cDNA sequencing or something non-standard I guess bwa mem would be the most suitable aligner for this job.

As Pierre remarked you haven't told us what you want to do with those reads - so we can't answer your question.

ADD REPLY
0
Entering edit mode

Hi to both and thanks for answering!

Basically, what I want your opinion about is the following scenario:

>read_ID
ACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCG.GGGCCGCCACTCCACCGGCGGCTTGGACGCGCTGAACAAGGAATGAATTTATTAAGAATGAGGTTTGCAGGAGCGGATTGCTTCGAACCGCTTGACTTGATTGTCCTCCGCGAATTAATAGCTCCTGTTGGGTCATTACGATATCTCTGGCCCCGTACTCACCAGTGGCTTTCCCCGGAGCGTGCCGGAGTTTTT.................................................................................................................................................................................
.................................................................................................................................................................................TCCTGTTGGGTCATAACGATATCTCTGGCCTCGTACTCACAAGTGGCTTTCCCCGGAGCGTGCCGTAGTTTTTGGCCATTTTTCGGCGTACAAAGGACCGAGAAGGGACGCCAAGTAGCCTGCAGGGCTTCGTACGCGAAACTAGCGTAACTAAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTT

Here, I have aligned both my reads to my reference and, as expected, one of the reads matches the left-most part and one the right-most part. However, since they are 250nts each, there is a part where they overlap (since my reference constract is ~430nts). So my question is, if the sequence in the overlapping region is not identical, as in this example:

TCCTGTTGGGTCAT**T**A
TCCTGTTGGGTCAT**A**A

I guess we can't be sure if in this position we have T or A. So, should I use both sequences/possibilities separately although the belong to the same read?

ADD REPLY
0
Entering edit mode

So, should I use both sequences/possibilities separately although the belong to the same read?

Again, it may depend on what is your goal. Variant Calling? A variant caller should consider that these reads are a pair and vote it in some way that there is a mismatch between them. Or the way I prefer is, to merge overlapping paired reads before alignment using bbmerge. This produces longer reads, which is better for mapping, and a base quality/error correction is made in the overlapping part.

fin swimmer

ADD REPLY
0
Entering edit mode

Thank you! I was not aware of bbmerge, I will give it a try ASAP!

ADD REPLY
0
Entering edit mode

Your example on discusses one read pair, but in reality (I hope) your region of interest will be covered by more reads and then it doesn't really matter that for one of the pairs there is a disagreement. The variant caller will figure it out.

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6