Aligning high depth data to short target sequence
Entering edit mode
7.9 years ago
c.v.oflynn ▴ 100

Hi All,

Just a general query regarding best practice;

What is the BEST way to align WGS data to an individual gene or fragment?

Example: I have several libraries of high coverage shotgun sequence from my favourite organism. I have previously aligned this data to the latest high quality draft genome. I can use the annotations and locations to easily extract sequences, variants or whatever from the genome. If a gene is not annotated I can also easily locate it within the draft genome using homology searches and then extract whatever information I am interested in.

But what if the gene I am interested in is present within the organism and previously sequenced but missing from the draft genome. What is the best way to use my WGS sequences to inspect this gene?

I can think of a couple of strategies neither are fully satisfying..

Map my libraries directly to the single gene but I end up with crazy high coverage, some weird calls and I do not entirely trust this method.

Or do I add the gene as a mock chromosome to the reference sequence and re-align to this new genome, removing high coverage issues and hopefully only recruiting the correct reads to the gene of interest? I would still miss reads that overlap the ends of the mock chromosome.

Any thoughts?


whole-genome resequencing target alignment • 1.5k views
Entering edit mode
7.9 years ago

Ideally you'd add the gene as a new contig to the multifasta file and map against that. If you use local alignment then the end of the segments should still get covered.

Entering edit mode

This is the right approach. Aligners try very hard to place every read, and if you map all of the reads to only your segment of interest, you're probably going to get lots of reads incorrectly mapped there that would otherwise map well to other portions of the genome.


Login before adding your answer.

Traffic: 994 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6