Question: Aligning high depth data to short target sequence
1
gravatar for c.v.oflynn
3.9 years ago by
c.v.oflynn90
United Kingdom
c.v.oflynn90 wrote:

Hi All, 

Just a general query regarding best practice; 

What is the BEST way to align WGS data to an individual gene or fragment? 

Example: I have several libraries of high coverage shotgun sequence from my favourite organism. I have previously aligned this data to the latest high quality draft genome. I can use the annotations and locations to easily extract sequences, variants or whatever from the genome. If a gene is not annotated I can also easily locate it within the draft genome using homology searches and then extract whatever information I am interested in. 

But what if the gene I am interested in is present within the organism and previously sequenced but missing from the draft genome. What is the best way to use my WGS sequences to inspect this gene?

I can think of a couple of strategies neither are fully satisfying..

Map my libraries directly to the single gene but I end up with crazy high coverage, some weird calls and I do not entirely trust this method.

Or do I add the gene as a mock chromosome to the reference sequence and re-align to this new genome, removing high coverage issues and hopefully only recruiting the correct reads to the gene of interest?  I would still miss reads that overlap the ends of the mock chromosome.

Any thoughts?

 

Ciaran

 

ADD COMMENTlink modified 3.9 years ago by Devon Ryan93k • written 3.9 years ago by c.v.oflynn90
3
gravatar for Devon Ryan
3.9 years ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

Ideally you'd add the gene as a new contig to the multifasta file and map against that. If you use local alignment then the end of the segments should still get covered.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Devon Ryan93k

This is the right approach. Aligners try very hard to place every read, and if you map all of the reads to only your segment of interest, you're probably going to get lots of reads incorrectly mapped there that would otherwise map well to other portions of the genome.

ADD REPLYlink written 3.9 years ago by Chris Miller21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1059 users visited in the last hour