Is it possible to construct a full genome from whole sequence VCF files containing snp, indel, sv, cnv data?
1
0
Entering edit mode
8 weeks ago
adam • 0

I'm wondering how one might go about reconstituting the whole genome by combining VCF data with the reference data GRCh37? Are there any tools for this? Thank you in advance

bioinformatics sequence • 571 views
ADD COMMENT
0
Entering edit mode

What's your actual aim in doing this? What data do you have?

ADD REPLY
0
Entering edit mode

Trying to learn more about the data I have and what's possible. I have VCF, FASTA, and BAM data from a whole genome sequence.

ADD REPLY
0
Entering edit mode

You want to do whole genome assembly?

ADD REPLY
0
Entering edit mode

Yes, exactly. Thank you

ADD REPLY
2
Entering edit mode
8 weeks ago
William ★ 5.0k

You can use GATK FastaAlternateReferenceMaker https://gatk.broadinstitute.org/hc/en-us/articles/360037594571-FastaAlternateReferenceMaker

This only works for SNPs and easy INDELS. And there are more limitations/options, see GATK documentation.

It can't integrate CNV/SV into the existing reference genome.

A denovo assembly starting from the raw reads is needed to get a new reference genome with complex variation resolved.

ADD COMMENT
0
Entering edit mode

Extremely helpful, thank you. For learning purposes, would you happen to have an example of how to construct a denovo assembly from the raw reads? I have both FASTA and BAM.

ADD REPLY
1
Entering edit mode

For de-novo assembly you need the raw reads in the FASTQ files. De-novo assembly is a difficult process and makes most sense if you have modern long and correct reads. Or if you don't yet have any reference genome for the species that you are working on. Otherwise I would just stay with the reference genome based approach. See this paper for all the effort that went into the latest "telomere to telemore" human reference genome https://www.biorxiv.org/content/10.1101/2021.05.26.445798v1.full.pdf

ADD REPLY

Login before adding your answer.

Traffic: 2817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6