Question: What to do when reads pairs are almost same,like 90-100 % overlap each other in paired reads?
0
gravatar for crivenster
2.5 years ago by
crivenster30
India
crivenster30 wrote:

I have a HLA based NGS data from Myseq. How to deal with the overlap in NGS data when the read one and read two of a read pair (PE) overlap more than 90 % or even they contain the same exact sequence among them? I am working on pre-processing script that goes with the pipeline already present. 

ADD COMMENTlink modified 2.5 years ago by dariober8.2k • written 2.5 years ago by crivenster30

Why do you want to do something with them ? Can't you treat them as normal PE data ?

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by geek_y8.1k

But wont they affect the coverage calculation when,some regions will have more reads(due to read pairs being the same) and some region having less number of reads (due to read pairs in that region don't or have very little overlap)?

ADD REPLYlink written 2.5 years ago by crivenster30

No, it doesn't matter.  The insert size should generally be independent of the genome, but regardless, the coverage is not really affected by the insert size (other than +-1).

The most important thing to do in this case is to adapter-trim reads, as inserts shorter than read length will have adapter sequence that will cause poor mapping.

ADD REPLYlink written 2.5 years ago by Brian Bushnell14k
1

If read pairs overlap some tools might double count coverage in the overlapping portion, which is incorrect as you are just sequencing the same fragment twice (I'm not sure if this what crivenster meant though).

ADD REPLYlink written 2.5 years ago by dariober8.2k
1

I have no idea what I was thinking when I said +-1, you're correct, the difference can be a factor of 2.  The point I was trying to make was that this will be evenly distributed everywhere so it shouldn't really affect a coverage analysis much.  Once you have sufficient coverage at some location, the insert size will not matter very much.

BBMap has a "physcov" flag that will allow calculation of physical coverage, meaning that with large inserts the unsequenced bases in the middle will be counted, and with short inserts the double-covered bases will only be counted once, rather than twice.  But I think if this analysis was done with physical coverage enabled versus disabled the conclusion would be the same.

ADD REPLYlink written 2.5 years ago by Brian Bushnell14k
1
gravatar for Alvaro Sebastian
2.5 years ago by
Poland
Alvaro Sebastian30 wrote:

Merge them, the joining program will take the nt with better quality in each position:

http://thegenomefactory.blogspot.com/2012/11/tools-to-merge-overlapping-paired-end.html

Is amplicon sequencing data (PCR products sequencing)? I think with DNA fragmentation is more difficult to have this problem.

ADD COMMENTlink written 2.5 years ago by Alvaro Sebastian30

There have been better tools developed for merging reads via overlap since 2012, but merging is not really necessary in this case.

ADD REPLYlink written 2.5 years ago by Brian Bushnell14k
1
gravatar for dariober
2.5 years ago by
dariober8.2k
Glasgow - UK
dariober8.2k wrote:

After mapping PE reads, I usually soft clip the overlapping part of one of the two reads. There is a nice program for this: clipOverlap, I think it is better to clip after mapping rather than merging reads as Alvaro suggests. Also take care that if read pairs overlap by 100% some aligners might not mark them as "mapped in proper pair", whereas I think they are.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by dariober8.2k

So its better to perform read merging or soft clipping as u mentioned when reads overlap 100 % before alignment ? this way,i can avoid the alignment of same region twice and may be get better mapping results? The hla data i use,is generated has a fragment lengths between 200-500 bp,as some of the sequencing regions are only 250 bp in length. thus the chances of overlap is certain and having 100% overlap has been common in data generated here.

ADD REPLYlink written 2.5 years ago by crivenster30
1

As I said, I prefer to soft clip after mapping rather then merging. I don't take in consideration how much overlap there is between pairs (100% or just 1 base), I just clip whatever is overlapping.

ADD REPLYlink written 2.5 years ago by dariober8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1670 users visited in the last hour