Question

getFragmentWidths (alpine) reports fragments shorter than mate read length

0

Entering edit mode

5.6 years ago

olga.bio • 0

Dear all,

I am running into a weird constellation while trying to use the alpine GC bias correction tool on own data. I am following this tutorial in particular: https://www.bioconductor.org/packages/devel/bioc/vignettes/alpine/inst/doc/alpine.html

The data I am using is paired end RNA-Seq with read length of 100. Applying getFragmentWidth on the bam file aligned with STAR returns following quantiles

c(summary(w), Number=length(w))

Min.  1st Qu.   Median     Mean  3rd Qu.     Max.   Number 

68.0000 122.0000 144.0000 160.6159 184.0000 571.0000 578.0000

What is the reason for fragments shorter than the read length? Overlapping mates? Unmapped mates? Softclipping performed by STAR aligner? Or am I misunderstanding the concept of what should happen here?

Hence, if I further follow the documentation / tutorial and use 2.5% quantile (~ 88) as the minsize, it is shorter than the read length, which leads to issues down the line. (buildFragtypes table with NA values in gread1end and gread2start columns). While I understand why the NAs are introduced, I am unsure how to best deal with it.

Any suggestions and thoughts are very appreciated!

alpine GC correction RNA-Seq • 766 views

ADD COMMENT • link 5.6 years ago by olga.bio • 0