Question: Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid [mpileup] 1 samples in 1 input files
0
gravatar for Aamiralizai
2.9 years ago by
Aamiralizai0 wrote:

Hello everyone

i am facing a problem while run a command to call variant, i am using sorted.bam file (generated by picard-tools-1.119)

$SOFTWARE/samtools mpileup -d8000 -ugf $DATABASE/hg19.fa $DATA/1.sorted.bam |$SOFTWARE1/bcftools call -m |$SOFTWARE1/vcfutils.pl varFilter -Q 20 - > samtool.vcf

i got this error every time Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid [mpileup] 1 samples in 1 input files

anyone can help me how to solve this issue ?

Thanks

snp • 3.3k views
ADD COMMENTlink modified 2.1 years ago by Istvan Albert ♦♦ 80k • written 2.9 years ago by Aamiralizai0

Looks like a warning to me, does the command produce the expected output?

ADD REPLYlink written 2.9 years ago by WouterDeCoster40k

No it give a vcf file having 3k which having following info and then stopped

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##samtoolsVersion=1.3.1+htslib-1.3.1
##samtoolsCommand=samtools mpileup -d8000 -ugf /lustre/home/bioxsyy/DATABASE/hg19.fa /lustre/home/bioxsyy/Aamir/data/1.sorted.bam
##reference=file:///lustre/home/bioxsyy/DATABASE/hg19.fa
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)">
##INFO=<ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.3.1+htslib-1.3.1
##bcftools_callCommand=call -m
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  /lustre/home/bioxsyy/Aamir/data/1.sorted.bam
ADD REPLYlink modified 2.9 years ago by Devon Ryan91k • written 2.9 years ago by Aamiralizai0
1

In other words, your real question is, "Why am I getting no called variants with samtools mpileup? I don't receive any error message." The answer to that is to first look at the alignments in IGV or something like that and see if you reasonably should be getting any.

ADD REPLYlink written 2.9 years ago by Devon Ryan91k

Hi Devon Ryan

i think there is no problem with my sorted.bam because i used this file for GATKs to call SNPs

ADD REPLYlink written 2.9 years ago by Aamiralizai0

Hello everyone,

I'm calling variants for the first time myself and am generating a bcf file very much like Aamiralizai's, in which (in my understanding) no variants were called. I know I have callable SNPs, however, and so I suspect I am making a simple error in usage. Was there a solution to the original poster's question?

Here's what I did:

1) Downloaded a bam file using sra-toolkit sam-dump for the accession number ERR854867.

sam-dump ERR854867 | samtools view -bS - > ERR854867.bam
samtools view -c ERR854867.bam

4046407

samtools view -h ERR854867.bam | head -30

@HD VN:1.4 SO:coordinate @SQ SN:gi|59800473|ref|NC_002946.2| LN:2153922
UR:/lustre/scratch109/srpipe/references/Neisseria_gonorrhoeae/FA_1090/all/fasta/NC_002946.f a AS:NC_002946.2 M5:00b0701b6ca8456c29436d89fd1ab2f9
SP:Neisseria gonorrhoeae FA 1090 @RG ID:1#1 PL:ILLUMINA
PU:150127_HS27_15335_A_C69LTACXX_2#1 LB:12711112 DS:ERP008891: Study to further define the pattern of emerge nce and network of spread of gonococcus with reduced susceptibility to cefixime and azithromycin in the US and to define the genes that confer resi stance to cefixime and azithromycin and the dynamics of resistance determinants These data are part of a pre-publication release. For information o n the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
DT:2015-01-27T00:00:00+0000 SM:ERS621380 PG:BamIndexDecoder
CN:SC @PG ID:SCS PN:HiSeq Control Software DS:Controlling software on instrument VN:2.0.12.0 @PG ID:basecalling PN:RTA PP:SCS DS:Basecalling Package VN:1.17.21.3

...

@PG ID:scramble PN:scramble PP:BamTagStripper
VN:1.13.8 CL:/software/solexa/bin/scramble -I bam -O cram -r /lustre/scratch109/srpipe/references/Neisseria_gonorrhoeae/FA_1090/all/fasta/NC_002946.fa 2152368 16 gi|59800473|ref|NC_002946.2| 1 37 100M
* 0 0 2152368 16 gi|59800473|ref|NC_002946.2| 1 37 100M * 0 0
ATAAATTTTTGCACGGGTTGTGGATAAAATATCGGCGAGTCGGTATAATCGGTTCGCTGCGTTTTGAACCGACGCGTATTCAACAGATTTGTTTTCTTTT EDDDDDDDDDDDDDDDB<ddddddeeeedddddddddddddddeddddbadffhhjjjijjjjjjjjjjjjjjjjjjjjjijjjjjjhhhhhfffffccc rg:z:1#1="" nh:i:1="" nm:i:0="" 2007699="" 147="" <br=""/> gi|59800473|ref|NC_002946.2| 1 37 1S99M =
2153783 2153882 GATAAATTTTTGCACGGGTTGTGGATAAAATATCGGCGAGTCGGTATAATCGGTTCGCTGCGTTTTGAACCGACGCGTATTCAACAGATTTGTTTTCTTT DEDDDDDDDDDDDDDDDDDDDDCDDEEEEDDDDDDDDDDDDDCDEDDDBDDDFHGIJIGGJJIJIEIIGJIHGJJJJJJIJJJJJJJHHHHHFFFFFCCC RG:Z:1#1 NH:i:1 NM:i:0

2) Manually download the reference genome fasta file (reference.fasta) against which the reads were aligned (N. gonorrhoeae FA 1090), which is here: https://www.ncbi.nlm.nih.gov/nuccore/59800473. I manually checked and found that the position (field #4) in the bam reads indeed matches which ASCII char in this fasta file the read aligns to. The reads are also have the same ID (SAM field #1) and position as those displayed in the NCBI's genome browser (https://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_002946.2&srz=ERR854867).

3) Change the name in the fasta header (>NC_002946.2 to >gi|59800473|ref|NC_002946.2|) because the bam file has the latter name in its @SQ header, and bcftools will throw an error otherwise.

vim reference.fasta

4a) Run bcftools mpileup. I used --ploidy 1 because bacteria are haploid. This gives an empty VCF file like Aamiralizai's file.

bcftools mpileup -Ou -f reference.fasta ERR854867.bam | bcftools call -vm -Oz --ploidy 1 -o ERR854867.vcf.gz

4b) This also gives an empty BCF file.

bcftools mpileup -Ob -o ERR854867.mpileuponly.bcf -f reference.fasta ERR854867.bam

4c) Using two bam files gives a BCF file containing a line for every 1 bp in the reference genome... I don't know if the BCF is correct.

bcftools mpileup -Ob -o twosamples.bcf -f reference.fasta ERR854867.bam ERR854869.bam
ADD REPLYlink modified 19 months ago • written 19 months ago by liawericj10
1

And... I think I found the answer. I was using bcftools view -h foo.bcf instead of bcftools view foo.bcf. This led me to think my file was empty, but in reality bcftools was showing me only the headers. This behavior is different than samtools view -h foo.bam; in that software, the '-h' flag includes headers that would otherwise be hidden.

ADD REPLYlink modified 19 months ago • written 19 months ago by liawericj10
2
gravatar for Istvan Albert
2.1 years ago by
Istvan Albert ♦♦ 80k
University Park, USA
Istvan Albert ♦♦ 80k wrote:

This is a warning and not an error. It informs you that knowing the ploidy (the number of sets of chromosomes in a cell) is recommended to improve quality of the variant calls. The warning states that the tool will assume a diploid organism when the parameter is not set. See:

bcftools call --ploidy ?
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Istvan Albert ♦♦ 80k

thanks @Istvan, very useful!

ADD REPLYlink written 23 months ago by Stephane Plaisance400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1632 users visited in the last hour