Question: Making a haploid genome
gravatar for Jautis
2.2 years ago by
United States
Jautis240 wrote:

Hi, I have a reference genome (fasta) and list of genotypes (vcf) for one individual. I want to make a haploid genome file by randomly selecting one of the individual's alleles at each variable site. I would use GATK's alternate reference maker, but I do not want to bias my choice to either the reference or non-reference allele. Anybody know how I can do this? 

Thank you! 

variant haploid genome vcf • 698 views
ADD COMMENTlink modified 2.2 years ago by JC6.2k • written 2.2 years ago by Jautis240
gravatar for JC
2.2 years ago by
JC6.2k wrote:

use strict;
use warnings;
use autodie;

$ARGV[2] or die "use <VCF> <FASTA> <OUTPUT>\n";

my $vcf = shift @ARGV;
my $fas = shift @ARGV;
my $out = shift @ARGV;

# load sequences
my %seq;
$/ = "\n>";
open (my $fh, "<", $fas);
while (<$fh>) {
    my ($id, @seq) = split (/\n/, $_);
    $seq{$id} = join "", @seq;
close $fh;

# parse VCF
$/ = "\n";
open (my $vh, "<", $vcf);
while (<$vh>) {
    next if (m/^#/);
    my ($chr, $pos, $id, $ref, $alt, $qual, $fil, $info) = split (/\t/, $_);
    next unless ($alt =~ m/^[ACGT]$/); # only valid SNPs
    my $afrq = 0.5; # in case of missing values use a 50% chance
    if ($info =~ m/AF=(0\.\d+)/) {
        $afrq = $1;
    my $prob = rand();
    if ($prob < $afrq) {
        substr ($seq{$chr}, $pos - 1, 1) = $alt);
        warn "modified $chr:$pos $ref -> $alt\n";
close $vh;

# Write modified sequences
open (my $oh, ">", $out);
while (my ($id, $seq) = each %seq) {
    print $oh ">$id\n";
    while ($seq) {
        print $oh substr ($seq, 0, 80), "\n";
        substr ($seq, 0, 80) = '';
close $oh;


* Cold night, procastinating work, perl lover

ADD COMMENTlink written 2.2 years ago by JC6.2k

Very useful! Will you teach me the ways of the force?

ADD REPLYlink written 2.2 years ago by apelin20460

You don't need a teacher, just start coding to solve problems like this. 

ADD REPLYlink written 2.2 years ago by JC6.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1474 users visited in the last hour