HapCUT2 using GATK results
1
0
Entering edit mode
7.3 years ago
jyu429 ▴ 120

Hi,

I'm trying to generate haplotypes from using both WGS reads from a bam file and GATK results identifying unphased variants as a vcf. I was considering using HapCUT2, but the vcf format required as input (as detailed here) seems to be significantly different from the vcf format that GATK outputs. Is there a standard way to transform the GATK output to the appropriate HapCUT2 input format?

Alternatively, is there a better means of generating haplotypes from short reads?

I appreciate any help!

phasing hapcut2 gatk vcf • 2.5k views
ADD COMMENT
2
Entering edit mode
6.8 years ago
lloydlow ▴ 20

Hi jyu429,

As given here https://github.com/vibansal/HapCUT2, you need at least two inputs to run HAPCUT2. My answers below aim to answer your last question and helping you to generate a vcf file that works. I am not sure what type of sequencing reads you have used to generate the bam file but if it's short reads, HiC is based on short reads and seems to be a good one to use. The vcf file, you can generate using WGS Illumina paired end reads at maybe ~30 X coverage. If you have these reads to call your snps and indels, then you can align these reads to your reference using BWA and later sort them with SAMTOOLS. Assuming you have the Illumina reads I mentioned, you can use below code to get the vcf file. Hope it helps.

BWA ver 0.7.12 SAMtools ver 0.1.18

samtools mpileup -uf reference.fasta reads.sorted.bam | bcftools view -vcg - > file.vcf
samtools mpileup -uf reference.fasta reads.sorted.bam | bcftools view -vcgI - > file_noIndel.vcf

The latter command will output without indels. If you wish to use indels, you have to provide the reference fasta as well in the command line when running HAPCUT2.

ADD COMMENT

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6