NGS hybrid reads
1
4
Entering edit mode
8.2 years ago
Max Ivon ▴ 130

Hello!
I am trying to analyze data from Illumina sequencing and observe enormous amount (up to 50%) of hybrid reads, i.e. those that align (with BWA) to two different positions on genomes. I suppose that this is lead by improper library preparation, specifically skipping or failing dephosphorylation (so reads sticked together). The question is should I consider any specific additional steps during data analyzis when I manage such data? Removing all of them will dramatically reduce coverage and I do realize that though reads are hybrid, parts are derived from specific places on genomes and still can be used to obtain information about it. Though I am still confused that leaving hybrid reads may introduce any errors. Any thoughts?

next-gen sequencing • 1.8k views
ADD COMMENT
0
Entering edit mode

You mean part of the read (say first 50 nt) align to one part of the genome and another part of the read (say last 50 nt) align to another part of the genome?

ADD REPLY
0
Entering edit mode

Are these paired-end reads or single-end reads?

ADD REPLY
0
Entering edit mode

Yes, reads are paired.

ADD REPLY
0
Entering edit mode

Yes, you got it right

ADD REPLY
0
Entering edit mode

I have not seen this happen before. Can you share a few lines of SAM output, just to be sure we are all talking about the same thing?

ADD REPLY
3
Entering edit mode
8.2 years ago

Actually, this is most likely caused by a mismatch in read pairing somewhere in the fastq files. Check that the two fastq files have the same number of reads and that the reads are in the same order. I suspect you'll find that one or both of those is not true. If so, some data forensics to sort out where the fastq files became corrupt is needed.

ADD COMMENT
0
Entering edit mode

fastq files completely correct if we talk about read count in two files and ordering.

ADD REPLY
2
Entering edit mode

I agree with Sean here that an order mess-up is far more likely than the biological alternative.

The first question really needs to be, do you see whole reads mapping (i.e. first and second to different places/chromosomes) or do you actually see true hybrids (in a single read, the first half maps to one location and the second to another)? If the latter, what program did you use to map?

If the former, I would turn the data into SE data, map it, then find a quiet spot in the genome and see if you spot two single-end reads facing each other (and presumably are a pair), but in your paired-end data they apparently have different mates miles away.

Also please keep up updated because whatever your issue is its interesting :)

ADD REPLY

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6