Just a general query regarding best practice;
What is the BEST way to align WGS data to an individual gene or fragment?
Example: I have several libraries of high coverage shotgun sequence from my favourite organism. I have previously aligned this data to the latest high quality draft genome. I can use the annotations and locations to easily extract sequences, variants or whatever from the genome. If a gene is not annotated I can also easily locate it within the draft genome using homology searches and then extract whatever information I am interested in.
But what if the gene I am interested in is present within the organism and previously sequenced but missing from the draft genome. What is the best way to use my WGS sequences to inspect this gene?
I can think of a couple of strategies neither are fully satisfying..
Map my libraries directly to the single gene but I end up with crazy high coverage, some weird calls and I do not entirely trust this method.
Or do I add the gene as a mock chromosome to the reference sequence and re-align to this new genome, removing high coverage issues and hopefully only recruiting the correct reads to the gene of interest? I would still miss reads that overlap the ends of the mock chromosome.