I want to build a consensus sequence for a small region of the genome from an individual in 1000genomes. Here's what I did:
to extract a part of sequence:
samtools view -b ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam 17:7512445-7512545 -o HG00096_1.bam
use mpileup and bcftools to generate vcf file:
samtools mpileup -uf ../../1kg/human_g1k_v37.fasta HG00096_1.bam | bcftools view - -o HG00096_1_w_rg.vcf
[mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000
and convert to fastq with vcftuils
vcfutils.pl vcf2fq HG00096_1_w_rg.vcf
now I have errors:
substr outside of string at /usr/local/bin/vcfutils.pl line 557, <> line 393. Use of uninitialized value in lc at /usr/local/bin/vcfutils.pl line 557, <> line 393. substr outside of string at /usr/local/bin/vcfutils.pl line 557, <> line 393.
So I had a look at the vcf file, and found that values for the alt column were always <X>, the info column having no quality, where vcfutils treated as indels. The line of code generated the errors, line 557, looked like:
substr($$seq, $beg, $end - $beg) = lc(substr($$seq, $beg, $end - $beg));
As all rows in the vcf file were treated as indels, the length of
$seq didn't increase while
$beg took the values of the starting position of all input lines till end of the file. So in such case vcftuils will always crash on this error.
Is the problem with mpileup parameter choices, or si it the way vcfutils designed to behave? Is there any fix to this? It's quite possible that I was missing something very basic because I'm new to this field. Any suggestion would be helpful.