Entering edit mode
2.2 years ago
Michael ▴ 270
How do you explain the following insert-size peak? Novaseq PE 150bp. Insert size estimation by fastp. I assume it is an artifact from fastp's insert size estimation, not sure how this happens though.
It is exactly at the read length of 151 bp (fastp runs combined with MultiQC):
I am not a
fastpuser so don't know how it is calculating insert size. My assumption would be by overlapping R1/R2 reads.
So that peak may represent reads that overlap by just one bp.
If you were interested in calculating the insert sizes then you could also try BBMap suite: C: Target fragment size versus final insert size
I think it is not as you described. fastp will calculated overlaps form R1/R2 pairs. But I think it will just go down to about 30bp overlap. For fewer bases the risk of overlap by chance occurs in repetitive regions.
So I think the peak is where R1 and R2 are pretty much exactly aligning start to end. But still I would not see where the peak comes from.
EDIT: given that we have low percentages on the Y-Axis the peak it not that extreme. I still want to understand how this happens...
Thinking about this again that makes sense.
Fastq example report page says this
Even in the fastp example report there appears to be a peak at the same location as yours.
MultiQCseems to be exaggerating the Y-scale a bit.
Thanks! I saw that on the fastp example page. But I still do not get how this is happening... :(
Likely an artifact as you originally said. If you have a dataset of different sequencing length see if the peak shifts accordingly.