Question

How to quantify the overlapping reads in paired-end DNA sequencing to check the sequencing efficiency

1

Entering edit mode

6.6 years ago

cjgunase ▴ 50

Hi All, I am a newbie to sequencing technologies.

Is there a way to quantify the paired-end sequencing overlap. since completely overlapping the read pairs would be a waste of sequencing resources. A bit of overlap can be useful when doing the alignment but a small gap is optimal to maximize coverage.

Is there a way to check sequencing efficiency by using the alignment files. Because if there is lots of overlap we want to improve this.

Any help is appreciated.

Thank you.

sequencing • 8.0k views

ADD COMMENT • link updated 6.6 years ago by h.mon 35k • written 6.6 years ago by cjgunase ▴ 50

0

Entering edit mode

but a small gap is optimal to maximize coverage.

no agree: If the two reads overlap, it usually means that the sequenced fragment was too short.

in paired-end sequencing, you'd better consider the sequencing depth to "maximize coverage".

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 161k

2

Entering edit mode

6.6 years ago

h.mon 35k

In addition to the mapping statistics from bam files suggested above, you can calculate an (biased) estimate of the overlap from the fastq files buy simply merging R1 and R2, e.g., with BBMerge:

bbmerge.sh in1=r1.fq in2=r2.fq ihist=ihist.txt

ADD COMMENT • link 6.6 years ago by h.mon 35k

1

Entering edit mode

6.6 years ago

Renesh ★ 2.2k

https://drive.google.com/file/d/0B3SqUxkB3WxnRWtZbldlOHhZNHM/view?usp=sharing

ADD COMMENT • link 6.6 years ago by Renesh ★ 2.2k

0

Entering edit mode

6.6 years ago

igor 13k

You can calculate the overlap by comparing the read length to the fragment/insert size. There are a few ways to do that. See this previous discussion: Is It Possible To Get Fragment Length, Read Length And Number Of Fragments From A Bam/Sam File

ADD COMMENT • link 6.6 years ago by igor 13k

score 3 · Accepted Answer · 2017-10-04

3

Entering edit mode

6.6 years ago

Renesh ★ 2.2k

Yes, you can do this with alignment BAM/SAM file. You can extract the record for concordant alignment YT:Z:CP from SAM/BAM file. Once you have concordant alignment, you can look for field 9 in SAM/BAM file. Field 9 (9th column) represents the fragment length of paired-end sequences.

From here, you can get the fragment length distribution. Based on your sequencing protocol, you should have the insert size for paired-end sequences. Then you should compare the fragment length with the insert size. If the fragment length is less than the sum of two reads, it means your paired sequences are overlapped. Here, you can plot the histogram of fragment length distribution.

Note: concordant alignment record will give you the alignments which are within the given insert size

ADD COMMENT • link 6.6 years ago by Renesh ★ 2.2k

0

Entering edit mode

sorry, I am confused. You said the to compare the fragment length with insert size. then how to check it is less than sum of two reads. I am new to this so sorry if this is very simple thing that i am missing.

ADD REPLY • link 6.6 years ago by cjgunase ▴ 50

1

Entering edit mode

FRAGMENT ========================================
READ1    ====>
READ2                                        <===

the Bam file contains the genomic position of the read as well as the sequence(=LEN of the read)

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

So if Insert Length (N) and read 1( r1) and read 2( r2). (r1+r2 - N) should give the length overlapping bases. (+) lengths for overlap. so we can plot a histogram of the frequency of overlapped base lengths. For high quality, this should be a right-skewed distribution.

ADD REPLY • link 6.6 years ago by cjgunase ▴ 50

0

Entering edit mode

https://drive.google.com/file/d/0B3SqUxkB3WxnRWtZbldlOHhZNHM/view?usp=sharing

ADD REPLY • link 6.6 years ago by Renesh ★ 2.2k

1

Entering edit mode

enter image description here

ADD REPLY • link 6.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

so based on the diagram above should consider insert length(which is from BAM file) NOT the fragment length?

ADD REPLY • link 6.6 years ago by cjgunase ▴ 50

0

Entering edit mode

The term fragment length from BAM file corresponds the total size covered by paired-end reads which may or may not equal to insert size.

ADD REPLY • link 6.6 years ago by Renesh ★ 2.2k