Question: Genome Assembly vs Reference Genome for slicing around mutation location.
gravatar for prashant10991
12 months ago by
prashant109910 wrote:

I have a bam file and corresponding vcf file from some source. I am trying to slice the DNA across its mutation location to feed into one of the algorithms.

I wanted to know what is the right way to do this. I have two options

  1. I can create a genome assembly/contigs from the bam file and then slice it using mutation information from vcf file.
  2. I can take a reference genome and then slice it from vcf file.

What are the pros and cons of either method?

ADD COMMENTlink written 12 months ago by prashant109910

Please tell us more about the goal of your "algorithms". Depending on that, the one or the other way might be better.

In general: If you variants are not phased and you create a consensus sequence, from where you like slice a region, you cannot be sure, that the variants next to each other are on the same strand. You need to know if this is important for your "alogrithms".

ADD REPLYlink written 12 months ago by finswimmer13k

Specifically, I am trying to learn the distributed representation of variants using some similar strategy to word2vec. So I want to slice DNA of 2*K+1 length centered around a mutation. But, I am in a dilemma of what is the correct way to slice the DNA so that most of the information is preserved.

To more clearly stating my doubt, Is it wise to use the reference genome against a patient-specific vcf file to slice DNA or one should first create gnome assembly/contigs (since read length is short and K > 200) from patient-specific bam.

How much information loss will occur in either case?

ADD REPLYlink written 12 months ago by prashant109910
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour