Hi, I am very new for NGS, so this question might be silly, recently I got my whole exome sequencing data for tumors and matched normal tissues, after run fastQC for my raw fastq files, I find the sequence length of some normal tissues are all 50-125, the other normal tissues and tumors are all 50-75, I think this affect the varients and CNV results, but I am not sure. Could anyone tell what will be the consequence? and why? Thanks
For variants and CNV results, the differences in length of reads may not be substantial. However, if you have libraries that were prepared differently and are comparing them, spend some time looking at other metrics such as depth of sequencing, differential coverage (different exome kits and even batches of the same can have an impact), and duplication metrics. None of these will disqualify a library if it cannot be replaced, but some may be more problematic than others.
Yes, differences in mapability will likely cause a host of false positives when comparing samples with different read lengths. Having 125bp reads opens up a bunch of the genome where 50bp reads are not uniquely mapable. Comparing read depth of the two between a tumor and normal will likely cause problems.
That said, as long as the tumor and normal from the same patient is the same length, then I wouldn't be overly concerned about comparing across patients.