Question: Making a haploid genome
gravatar for Jautis
5.6 years ago by
United States
Jautis290 wrote:

Hi, I have a reference genome (fasta) and list of genotypes (vcf) for one individual. I want to make a haploid genome file by randomly selecting one of the individual's alleles at each variable site. I would use GATK's alternate reference maker, but I do not want to bias my choice to either the reference or non-reference allele. Anybody know how I can do this? 

Thank you! 

variant haploid genome vcf • 1.5k views
ADD COMMENTlink modified 5.6 years ago by JC12k • written 5.6 years ago by Jautis290
gravatar for JC
5.6 years ago by
JC12k wrote:

use strict;
use warnings;
use autodie;

$ARGV[2] or die "use <VCF> <FASTA> <OUTPUT>\n";

my $vcf = shift @ARGV;
my $fas = shift @ARGV;
my $out = shift @ARGV;

# load sequences
my %seq;
$/ = "\n>";
open (my $fh, "<", $fas);
while (<$fh>) {
    my ($id, @seq) = split (/\n/, $_);
    $seq{$id} = join "", @seq;
close $fh;

# parse VCF
$/ = "\n";
open (my $vh, "<", $vcf);
while (<$vh>) {
    next if (m/^#/);
    my ($chr, $pos, $id, $ref, $alt, $qual, $fil, $info) = split (/\t/, $_);
    next unless ($alt =~ m/^[ACGT]$/); # only valid SNPs
    my $afrq = 0.5; # in case of missing values use a 50% chance
    if ($info =~ m/AF=(0\.\d+)/) {
        $afrq = $1;
    my $prob = rand();
    if ($prob < $afrq) {
        substr ($seq{$chr}, $pos - 1, 1) = $alt);
        warn "modified $chr:$pos $ref -> $alt\n";
close $vh;

# Write modified sequences
open (my $oh, ">", $out);
while (my ($id, $seq) = each %seq) {
    print $oh ">$id\n";
    while ($seq) {
        print $oh substr ($seq, 0, 80), "\n";
        substr ($seq, 0, 80) = '';
close $oh;


* Cold night, procastinating work, perl lover

ADD COMMENTlink written 5.6 years ago by JC12k

Very useful! Will you teach me the ways of the force?

ADD REPLYlink written 5.6 years ago by apelin20480

You don't need a teacher, just start coding to solve problems like this. 

ADD REPLYlink written 5.6 years ago by JC12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1810 users visited in the last hour