getFragmentWidths (alpine) reports fragments shorter than mate read length
0
0
Entering edit mode
5.6 years ago
olga.bio • 0

Dear all,

I am running into a weird constellation while trying to use the alpine GC bias correction tool on own data. I am following this tutorial in particular: https://www.bioconductor.org/packages/devel/bioc/vignettes/alpine/inst/doc/alpine.html

The data I am using is paired end RNA-Seq with read length of 100. Applying getFragmentWidth on the bam file aligned with STAR returns following quantiles

c(summary(w), Number=length(w))

Min.  1st Qu.   Median     Mean  3rd Qu.     Max.   Number 

68.0000 122.0000 144.0000 160.6159 184.0000 571.0000 578.0000

What is the reason for fragments shorter than the read length? Overlapping mates? Unmapped mates? Softclipping performed by STAR aligner? Or am I misunderstanding the concept of what should happen here?

Hence, if I further follow the documentation / tutorial and use 2.5% quantile (~ 88) as the minsize, it is shorter than the read length, which leads to issues down the line. (buildFragtypes table with NA values in gread1end and gread2start columns). While I understand why the NAs are introduced, I am unsure how to best deal with it.

Any suggestions and thoughts are very appreciated!

alpine GC correction RNA-Seq • 766 views
ADD COMMENT

Login before adding your answer.

Traffic: 2822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6