Dear all,
I am running into a weird constellation while trying to use the alpine GC bias correction tool on own data. I am following this tutorial in particular: https://www.bioconductor.org/packages/devel/bioc/vignettes/alpine/inst/doc/alpine.html
The data I am using is paired end RNA-Seq with read length of 100. Applying getFragmentWidth
on the bam file aligned with STAR returns following quantiles
c(summary(w), Number=length(w))
Min. 1st Qu. Median Mean 3rd Qu. Max. Number
68.0000 122.0000 144.0000 160.6159 184.0000 571.0000 578.0000
What is the reason for fragments shorter than the read length? Overlapping mates? Unmapped mates? Softclipping performed by STAR aligner? Or am I misunderstanding the concept of what should happen here?
Hence, if I further follow the documentation / tutorial and use 2.5% quantile (~ 88) as the minsize
, it is shorter than the read length, which leads to issues down the line. (buildFragtypes
table with NA
values in gread1end and gread2start columns). While I understand why the NAs are introduced, I am unsure how to best deal with it.
Any suggestions and thoughts are very appreciated!