When working with whole exome sequencing data, how many basepairs should I expect as padding for a given gene: 0, 100, 1000? For example, would the upstream transcription factor binding site for each gene be included?
A great question, Kermit.
IMHO, it all depends on the exome capture and selection chemistry. For example, the Agilent sureselect V8 plus NCV plus 5' UTR has less bias and could pad more bases.
Are you asked for an option while running a pipeline?
Hi Prash. I am the receiver of the BAM files from an old biobank, so I am not generating the data myself.
Ok Kermit. Pl ask your service provider about the chemistry. From BAM and sorted, you could as well but the alignment with reference matters as well.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy