Write header of contigs in a vcf file
1
0
Entering edit mode
2.2 years ago
angelaparody ▴ 80

Hi,

I am having the following issue. I used STACKS to produce a .vcf file and it does not contain header for contigs, and I need this in order to edit the ID column which is all 0s at the moment (see below, well, it is a bit mixed up). This is how the vcf file looks like (header and a few more rows):

##fileformat=VCFv4.2
##fileDate=20191104
##source="Stacks v2.41"
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Genotype Likelihood">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=loc_strand,Number=1,Type=Character,Description="Genomic strand the corresponding Stacks locus aligns on">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Lib1_3A Lib1_3B Lib1_3C Lib1_3D Lib1_3E Lib1_3F Lib1_4A Lib1_4B Lib1_4C Lib1_4D Lib1_4E Lib1_4F Lib1_7A Lib1_7A2    Lib1_7D Lib1_7D2    Lib1_7E Lib1_7F
scaffold_45 10  0   A   C   0


And when I use this option to edit the #IDs (I want IDs to be like scaffold_45_10, which is scaffold _CHROM_POS):

bcftools annotate --set-id +'%CHROM\_%POS' myfilename.vcf > new.vcf.


I get this error:

[W::vcf_parse] Contig 'scaffold_45' is not defined in the header. (Quick workaround: index the file with tabix.)
Encountered error, cannot proceed. Please check the error output above.


Therefore I would need to define contigs in the header (that is what the error says). To do so, I tried what is explain in the last message in this post: bcftools_Issue#766 Following exactly what it is said there I got a vcf file that looks like this:

    ##contig=<ID=scaffold_45, length=120>
##contig=<ID=scaffold_62, length=125>
##contig=<ID=scaffold_69, length=132>
##contig=<ID=scaffold_98, length=172>
##contig=<ID=scaffold_154, length=149>
##contig=<ID=scaffold_210, length=184>
(there are more rows)
**VIM - Vi IMproved 8.0 (2016 Sep 12, compiled Jun 21 2019 04:10:35)**
scaffold_45 10  0   A   C   0   PASS


Which is wrong. The header that was there has been replaced by the list of contigs IDs and lengths (which is what I want to include in the header but without losing the header that was there). There is also one extra row (VIM - Vi IMproved 8.0 (2016 Sep 12, compiled Jun 21 2019 04:10:35) that I don't know how it appeared and should not be there.

In short, I need to be able to edit the IDs of the vcf file, and for that I need to add the contig info in the header of the vcf file. The reason I want the IDs in the vcf file is to filter SNPs by IDs with vcftools. Does someone know how to solve this issue?

'Angela

PS: I may tray to include the header of my initial vcf file into the header.txt and see what happens...

vcf edit header • 2.2k views
3
Entering edit mode
2.2 years ago
angelaparody ▴ 80

I found a solution! Maybe not the best, probably it could be done in a different -and faster- way, but I am not a bioinformatician and this worked for me :)

An output of STACKS is catalog.fa.gz and I obtained the .fai doing this:

gzip -cd ./catalog.fa.gz > catalog.fa
samtools faidx catalog.fa > catalog.fa.fai


Then I took this code from Biostars: Question: Add contig lenght to VCF header in a robust way

awk '{printf("##contig=<ID=%s,length=%d>\\n",$1,$2);}' ref.fai > ToEditHeader.txt


I have edited this txt file and used it to edit what is between ** (see below).

awk '/^#CHROM/ { printf("**##contig=<ID=1,length=195471971>\n##contig=<ID=2,length=182113224>\n**");} {print;}' inputfile.vcf > contigsInHeader.vcf


Then I used this (see below) from: Biostars: Question: How edit VCF file (specifically #CHROM and ID columns), which finally set IDs on my contigsInHeader.vcf file:

bcftools annotate --set-id +'%CHROM\_%POS' contigsInHeader.vcf > SnpsWithIDs.vcf


I hope this help to anyone having the same issue!

Cheers,

'Angela