For my illumina data fastqc shows presence of N's at positions 13,14,15 in 101 bp longs reads. If i go for cropping first 15 bases by using trimmomatic, it solves the problem but i lose a lot of data. I wanted to know that if i retain the N's what sort of problems would they cause during alignment(bwa+stampy)/variant calling(unified genotyper) and how can i handle these problems?
If any body faced a similar problem how did you handle it?
Similar questions asked on different forums but none has answered.
Could not find a resourse on how variant calling programs handle N's. Do they ignore them? Or consider them as a variation with low confidence scores?
Following is the image for per base n content from fastqc
http://i43.tinypic.com/sfyz5z.jpg
Shouldn't you first investigate why you got those weird Ns at these positions?
These are possibly due to machine read errors during sequencing. These are particular to only 1 of 3 runs. Looking for a way of handling these without losing a lot of sequence data.