I am trying to solve a problem in my analysis and I want to make sure I understand the sequencing procedure that is happening for NGS.
Imagine we want to do a whole exome sequencing, the steps would be:
- Take the DNA
- Fragment the DNA
- Add the adapters to the end of fragments
- Use a cDNA based filter to only keep the coding region fragments
- Sequence ...
So now here is my question, how do we know the cDNA sequence before the sequencing? Is it based on GRCh19/38?
For highly polymorphic regions, such as HLA or KIR regions, is the cDNA sequence based on reference genome? (and therefor it is possible some reads are not captured?)