Question

Genome Assembly vs Reference Genome for slicing around mutation location.

0

Entering edit mode

5.3 years ago

prashant10991 • 0

I have a bam file and corresponding vcf file from some source. I am trying to slice the DNA across its mutation location to feed into one of the algorithms.

I wanted to know what is the right way to do this. I have two options

I can create a genome assembly/contigs from the bam file and then slice it using mutation information from vcf file.
I can take a reference genome and then slice it from vcf file.

What are the pros and cons of either method?

Assembly genome rna-seq mutation • 1.1k views

ADD COMMENT • link 5.3 years ago by prashant10991 • 0

0

Entering edit mode

Please tell us more about the goal of your "algorithms". Depending on that, the one or the other way might be better.

In general: If you variants are not phased and you create a consensus sequence, from where you like slice a region, you cannot be sure, that the variants next to each other are on the same strand. You need to know if this is important for your "alogrithms".

ADD REPLY • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

Specifically, I am trying to learn the distributed representation of variants using some similar strategy to word2vec. So I want to slice DNA of 2*K+1 length centered around a mutation. But, I am in a dilemma of what is the correct way to slice the DNA so that most of the information is preserved.

To more clearly stating my doubt, Is it wise to use the reference genome against a patient-specific vcf file to slice DNA or one should first create gnome assembly/contigs (since read length is short and K > 200) from patient-specific bam.

How much information loss will occur in either case?

ADD REPLY • link 5.3 years ago by prashant10991 • 0