Question: getting the HOM_REF genotypes using bcftools mpileup+call ?
1
gravatar for Pierre Lindenbaum
9 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

I've been asked to find the status of some variants (= overlaping a bed file) in a '1000 genomes' data set, including the HOM_REF genotypes (0/0).

For now I'm downloading each bam from 1000G and I call bcftools, (on sample per bam) , I played with --gvcf but I cannot find the correct way to get the 0/0 genotypes.

here a snippet of my current nextflow (it doesn't work: there is no '0/0' in my vcf)

${samtools} view -u -b "${sample}.bam" "${C}" |\
${bcftools} mpileup --output-type b --targets "${C}.bed" --gvcf 0 -a 'FORMAT/DP'  -a 'FORMAT/AD'  -o "tmp.bcf" --fasta-ref "${REF}" -

${bcftools} index  -f "tmp.bcf"

${bcftools} call --output-type z --ploidy GRCh37 --regions-file "${C}.bed" -o "tmp.${C}.vcf.gz" --gvcf 0 --multiallelic-caller "tmp.bcf"

${bcftools} index -t -f "tmp.${C}.vcf.gz"
ADD COMMENTlink modified 9 months ago by RamRS24k • written 9 months ago by Pierre Lindenbaum124k
1

Hey Pierre,

is there a reason why you use samtools view at the beginning? Also saving the mpileup result and indexing it is usually unnecessary.

Until now I wasn't able to reproduce your problem. I was able to get 0/0 genotype with the --gvcf option and without. Then you get a record for each covered position.

I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a bedfile for the region.

$ bcftools mpileup -Ou --gvcf 0 -f Homo_sapiens.GRCh37.dna.primary_assembly.fa -r 1:977320-977340 ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/high_coverage_alignment/HG00096.wgs.ILLUMINA.bwa.GBR.high_cov_pcr_free.20140203.bam | bcftools call -m --gvcf 0 -Oz -o 1kg.vcf.gz

Result is:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  HG00096
1   977320  .   C   .   .   .   END=977329;MinDP=14 GT:DP   0/0:14
1   977330  .   T   C   214 .   DP=28;VDB=0.652362;SGB=-0.688148;MQSB=1;MQ0F=0;AC=2;AN=2;DP4=0,0,11,4;MQ=60 GT:PL:DP    1/1:244,45,0:15
1   977331  .   C   .   .   .   END=977340;MinDP=15 GT:DP   0/0:15
ADD REPLYlink modified 9 months ago • written 9 months ago by finswimmer12k

is there a reason why you use samtools view at the beginning? Also saving the mpileup result and indexing it is usually unnecessary.

splitting the analysis by chromosome. It's not clear to me how to do this: When using bcftools --gvcf I think one need to use the streaming filter to get the BED (NOT random-access)

I've tried the long way like you did. And a short one (see below). The only difference to your code is, that I haven't use a bedfile for the region.

Thanks ! I'm away from the lab, I'll try tomorrow. About the BED: I'd like to avoid to extract the variants for the whole BAM...

ADD REPLYlink written 9 months ago by Pierre Lindenbaum124k

If I understand your code correct, you have a bed file for each chromosome? If so, you can omit the samtools view and use the --regions-file parameter for bcftools mpileup instead of --targets. bcftools have then random access to the regions.

If you want to query remote files (like I did in my test example) it might be useful to use samtools view to speed up things. Then apply the -L parameter along with your bed file and use -M to use the multi-region iterator.

But that's all about optimization and not resolving your problem ...

ADD REPLYlink written 9 months ago by finswimmer12k

Hi @finswimmer, I have a question regarding the command line to create gvcf file using bcftools. I used:

bcftools mpileup -Ov --gvcf 5 -f ref.fa example.sorted.bam | bcftools call -m --gvcf 5 -o example.vcf

I want to keep the calls with read depth more than 5. But the gvcf shows some calls with DP lower than 5. I am not sure why is this error happening. Thank you!

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by evelyn60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1715 users visited in the last hour