Question: VCF Tools converting
0
gravatar for storm1907
7 weeks ago by
storm19070
storm19070 wrote:

Hello, I am in need to convert vcf to fasta. Previous topics did not seem to resolve this type of problem

I have this code, but when trying to do bgzip and tabix commands, i afterwards get error like "the input file is not BGZF" and "Broken VCF header, no column names?" Any help will be appreciated

#!bin/bash
module load bio/vcftools/0.1.17
module load bio/samtools/1.10

export inpath=/dir

export outpath=/dir
mkdir -p $outpath
chmod u+rwx $outpath
export reference=/ref
$reference
for file in $inpath/* ;
do
        bname=$(basename $file)
        echo "base name is $bname"
        bfile=$outpath/$bname".bgz"
        tabfile=$outpath/$bname".tabix"
        outfile=$outpath/$bname".fasta"
        bgzip $file $bfile
        cd $outpath
        tabix -f $bfile > $tabfile
        cat $reference | vcf-consensus $tabfile > $outfile
done
software error • 120 views
ADD COMMENTlink written 7 weeks ago by storm19070

You're using the bgzip command wrong. You're also not giving tabix the preset to use, plus using the command itself wrong. Fix those and you should be fine. Read the bgzip/tabix manuals.

ADD REPLYlink written 7 weeks ago by _r_am31k

Thank, you, fixed

for file in $inpath/*.filtered ;
do
        echo $file
        bname=$(basename $file)
        echo "base name is $bname"
        bfile=$inpath/$bname".gz"
        outfile=$outpath/$bname".fasta"
        bgzip -c $file > $bfile
        tabix -p vcf $bfile
        cat $reference | vcf-consensus $bfile > $outfile
done

but the next problem is: I get output fasta file with correct sample name in file's title, and reference sequence name inside the file:

<nc_xxx> <sequence>

Any suggestions, why is that? :o

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by storm19070

Is that output not how you expect it to be? What do you expect and what are you getting?

ADD REPLYlink written 7 weeks ago by _r_am31k

I would like to see sample's name in the very begginning of file. Not reference sequence's name

ADD REPLYlink written 7 weeks ago by storm19070

That seems to be a custom requirement, which may not be addressed by vcf-consensus. You should be able to use awk or bioawk to add a line containing the FILENAME.

ADD REPLYlink written 7 weeks ago by _r_am31k

Thank you! I also had an idea, that vcf consensus might be supposed for generating multi-fasta. Because at the moment I am obtaining only single fasta files, which should be concatenated

ADD REPLYlink written 7 weeks ago by storm19070

Try switching to bcftools consensus. vcftools is obsolete, I think.

ADD REPLYlink written 7 weeks ago by _r_am31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1900 users visited in the last hour