Question

NGS hybrid reads

4

Entering edit mode

8.2 years ago

Max Ivon ▴ 130

Hello!
I am trying to analyze data from Illumina sequencing and observe enormous amount (up to 50%) of hybrid reads, i.e. those that align (with BWA) to two different positions on genomes. I suppose that this is lead by improper library preparation, specifically skipping or failing dephosphorylation (so reads sticked together). The question is should I consider any specific additional steps during data analyzis when I manage such data? Removing all of them will dramatically reduce coverage and I do realize that though reads are hybrid, parts are derived from specific places on genomes and still can be used to obtain information about it. Though I am still confused that leaving hybrid reads may introduce any errors. Any thoughts?

next-gen sequencing • 1.8k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by Max Ivon ▴ 130

0

Entering edit mode

You mean part of the read (say first 50 nt) align to one part of the genome and another part of the read (say last 50 nt) align to another part of the genome?

ADD REPLY • link 8.2 years ago by 5heikki 11k

0

Entering edit mode

Are these paired-end reads or single-end reads?

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by Sean Davis 26k

0

Entering edit mode

Yes, reads are paired.

ADD REPLY • link 8.2 years ago by Max Ivon ▴ 130

0

Entering edit mode

Yes, you got it right

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by Max Ivon ▴ 130

0

Entering edit mode

I have not seen this happen before. Can you share a few lines of SAM output, just to be sure we are all talking about the same thing?

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by Sean Davis 26k

Ram · Answer 1 · 2016-02-10

3

Entering edit mode

8.2 years ago

Sean Davis 26k

Actually, this is most likely caused by a mismatch in read pairing somewhere in the fastq files. Check that the two fastq files have the same number of reads and that the reads are in the same order. I suspect you'll find that one or both of those is not true. If so, some data forensics to sort out where the fastq files became corrupt is needed.

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by Sean Davis 26k

0

Entering edit mode

fastq files completely correct if we talk about read count in two files and ordering.

ADD REPLY • link 8.2 years ago by Max Ivon ▴ 130

2

Entering edit mode

I agree with Sean here that an order mess-up is far more likely than the biological alternative.

The first question really needs to be, do you see whole reads mapping (i.e. first and second to different places/chromosomes) or do you actually see true hybrids (in a single read, the first half maps to one location and the second to another)? If the latter, what program did you use to map?

If the former, I would turn the data into SE data, map it, then find a quiet spot in the genome and see if you spot two single-end reads facing each other (and presumably are a pair), but in your paired-end data they apparently have different mates miles away.

Also please keep up updated because whatever your issue is its interesting :)

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by John 13k