Question: Does sequence length matter for CNV and variants?
Hi, I am very new for NGS, so this question might be silly, recently I got my whole exome sequencing data for tumors and matched normal tissues, after run fastQC for my raw fastq files, I find the sequence length of some normal tissues are all 50-125, the other normal tissues and tumors are all 50-75, I think this affect the varients and CNV results, but I am not sure. Could anyone tell what will be the consequence? and why? Thanks

next-gen sequence • 134 views
Sean Davis24k
National Institutes of Health, Bethesda, MD
For variants and CNV results, the differences in length of reads may not be substantial. However, if you have libraries that were prepared differently and are comparing them, spend some time looking at other metrics such as depth of sequencing, differential coverage (different exome kits and even batches of the same can have an impact), and duplication metrics. None of these will disqualify a library if it cannot be replaced, but some may be more problematic than others.

The libraries from different patients were prepared together, and should be the OK, but it's true that my samples were sequenced by two different batches, I don't if there were any other differences for the two batches, I guess I have to contact the sequencing department and ask them, thanks for your comments.

Chris Miller19k
Washington University in St. Louis, MO
Yes, differences in mapability will likely cause a host of false positives when comparing samples with different read lengths. Having 125bp reads opens up a bunch of the genome where 50bp reads are not uniquely mapable. Comparing read depth of the two between a tumor and normal will likely cause problems.

That said, as long as the tumor and normal from the same patient is the same length, then I wouldn't be overly concerned about comparing across patients.

