Entering edit mode
6.8 years ago
Pierre Lindenbaum
166k
I've been asked to find the status of some variants (= overlaping a bed file) in a '1000 genomes' data set, including the HOM_REF genotypes (0/0).
For now I'm downloading each bam from 1000G and I call bcftools, (on sample per bam) , I played with --gvcf but I cannot find the correct way to get the 0/0 genotypes.
here a snippet of my current nextflow (it doesn't work: there is no '0/0' in my vcf)
${samtools} view -u -b "${sample}.bam" "${C}" |\
${bcftools} mpileup --output-type b --targets "${C}.bed" --gvcf 0 -a 'FORMAT/DP' -a 'FORMAT/AD' -o "tmp.bcf" --fasta-ref "${REF}" -
${bcftools} index -f "tmp.bcf"
${bcftools} call --output-type z --ploidy GRCh37 --regions-file "${C}.bed" -o "tmp.${C}.vcf.gz" --gvcf 0 --multiallelic-caller "tmp.bcf"
${bcftools} index -t -f "tmp.${C}.vcf.gz"
Hey Pierre,
is there a reason why you use
samtools viewat the beginning? Also saving the mpileup result and indexing it is usually unnecessary.Until now I wasn't able to reproduce your problem. I was able to get 0/0 genotype with the
--gvcfoption and without. Then you get a record for each covered position.I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a
bedfile for the region.Result is:
splitting the analysis by chromosome. It's not clear to me how to do this: When using
bcftools --gvcfI think one need to use the streaming filter to get the BED (NOT random-access)Thanks ! I'm away from the lab, I'll try tomorrow. About the BED: I'd like to avoid to extract the variants for the whole BAM...
If I understand your code correct, you have a
bedfile for each chromosome? If so, you can omit thesamtools viewand use the--regions-fileparameter forbcftools mpileupinstead of--targets.bcftoolshave then random access to the regions.If you want to query remote files (like I did in my test example) it might be useful to use
samtools viewto speed up things. Then apply the-Lparameter along with yourbedfile and use-Mto use the multi-region iterator.But that's all about optimization and not resolving your problem ...
Hi @finswimmer, I have a question regarding the command line to create
gvcffile usingbcftools. I used:I want to keep the calls with read depth more than
5. But the gvcf shows some calls with DP lower than 5. I am not sure why is this error happening. Thank you!