Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid [mpileup] 1 samples in 1 input files
1
0
Entering edit mode
5.1 years ago

Hello everyone

i am facing a problem while run a command to call variant, i am using sorted.bam file (generated by picard-tools-1.119)

$SOFTWARE/samtools mpileup -d8000 -ugf $DATABASE/hg19.fa $DATA/1.sorted.bam |$SOFTWARE1/bcftools call -m |$SOFTWARE1/vcfutils.pl varFilter -Q 20 - > samtool.vcf

i got this error every time Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid [mpileup] 1 samples in 1 input files

anyone can help me how to solve this issue ?

Thanks

SNP • 6.9k views
ADD COMMENT
0
Entering edit mode

Looks like a warning to me, does the command produce the expected output?

ADD REPLY
0
Entering edit mode

No it give a vcf file having 3k which having following info and then stopped

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##samtoolsVersion=1.3.1+htslib-1.3.1
##samtoolsCommand=samtools mpileup -d8000 -ugf /lustre/home/bioxsyy/DATABASE/hg19.fa /lustre/home/bioxsyy/Aamir/data/1.sorted.bam
##reference=file:///lustre/home/bioxsyy/DATABASE/hg19.fa
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=IDV,Number=1,Type=Integer,Description="Maximum number of reads supporting an indel">
##INFO=<ID=IMF,Number=1,Type=Float,Description="Maximum fraction of reads supporting an indel">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)">
##INFO=<ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">
##bcftools_callVersion=1.3.1+htslib-1.3.1
##bcftools_callCommand=call -m
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  /lustre/home/bioxsyy/Aamir/data/1.sorted.bam
ADD REPLY
1
Entering edit mode

In other words, your real question is, "Why am I getting no called variants with samtools mpileup? I don't receive any error message." The answer to that is to first look at the alignments in IGV or something like that and see if you reasonably should be getting any.

ADD REPLY
0
Entering edit mode

Hi Devon Ryan

i think there is no problem with my sorted.bam because i used this file for GATKs to call SNPs

ADD REPLY
0
Entering edit mode

Hello everyone,

I'm calling variants for the first time myself and am generating a bcf file very much like Aamiralizai's, in which (in my understanding) no variants were called. I know I have callable SNPs, however, and so I suspect I am making a simple error in usage. Was there a solution to the original poster's question?

Here's what I did:

1) Downloaded a bam file using sra-toolkit sam-dump for the accession number ERR854867.

sam-dump ERR854867 | samtools view -bS - > ERR854867.bam
samtools view -c ERR854867.bam

4046407

samtools view -h ERR854867.bam | head -30

@HD VN:1.4 SO:coordinate @SQ SN:gi|59800473|ref|NC_002946.2| LN:2153922
UR:/lustre/scratch109/srpipe/references/Neisseria_gonorrhoeae/FA_1090/all/fasta/NC_002946.f a AS:NC_002946.2 M5:00b0701b6ca8456c29436d89fd1ab2f9
SP:Neisseria gonorrhoeae FA 1090 @RG ID:1#1 PL:ILLUMINA
PU:150127_HS27_15335_A_C69LTACXX_2#1 LB:12711112 DS:ERP008891: Study to further define the pattern of emerge nce and network of spread of gonococcus with reduced susceptibility to cefixime and azithromycin in the US and to define the genes that confer resi stance to cefixime and azithromycin and the dynamics of resistance determinants These data are part of a pre-publication release. For information o n the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
DT:2015-01-27T00:00:00+0000 SM:ERS621380 PG:BamIndexDecoder
CN:SC @PG ID:SCS PN:HiSeq Control Software DS:Controlling software on instrument VN:2.0.12.0 @PG ID:basecalling PN:RTA PP:SCS DS:Basecalling Package VN:1.17.21.3

...

@PG ID:scramble PN:scramble PP:BamTagStripper
VN:1.13.8 CL:/software/solexa/bin/scramble -I bam -O cram -r /lustre/scratch109/srpipe/references/Neisseria_gonorrhoeae/FA_1090/all/fasta/NC_002946.fa 2152368 16 gi|59800473|ref|NC_002946.2| 1 37 100M
* 0 0 2152368 16 gi|59800473|ref|NC_002946.2| 1 37 100M * 0 0
ATAAATTTTTGCACGGGTTGTGGATAAAATATCGGCGAGTCGGTATAATCGGTTCGCTGCGTTTTGAACCGACGCGTATTCAACAGATTTGTTTTCTTTT EDDDDDDDDDDDDDDDB<ddddddeeeedddddddddddddddeddddbadffhhjjjijjjjjjjjjjjjjjjjjjjjjijjjjjjhhhhhfffffccc rg:z:1#1="" nh:i:1="" nm:i:0="" 2007699="" 147="" <br=""/> gi|59800473|ref|NC_002946.2| 1 37 1S99M =
2153783 2153882 GATAAATTTTTGCACGGGTTGTGGATAAAATATCGGCGAGTCGGTATAATCGGTTCGCTGCGTTTTGAACCGACGCGTATTCAACAGATTTGTTTTCTTT DEDDDDDDDDDDDDDDDDDDDDCDDEEEEDDDDDDDDDDDDDCDEDDDBDDDFHGIJIGGJJIJIEIIGJIHGJJJJJJIJJJJJJJHHHHHFFFFFCCC RG:Z:1#1 NH:i:1 NM:i:0

2) Manually download the reference genome fasta file (reference.fasta) against which the reads were aligned (N. gonorrhoeae FA 1090), which is here: https://www.ncbi.nlm.nih.gov/nuccore/59800473. I manually checked and found that the position (field #4) in the bam reads indeed matches which ASCII char in this fasta file the read aligns to. The reads are also have the same ID (SAM field #1) and position as those displayed in the NCBI's genome browser (https://www.ncbi.nlm.nih.gov/projects/sviewer/?id=NC_002946.2&srz=ERR854867).

3) Change the name in the fasta header (>NC_002946.2 to >gi|59800473|ref|NC_002946.2|) because the bam file has the latter name in its @SQ header, and bcftools will throw an error otherwise.

vim reference.fasta

4a) Run bcftools mpileup. I used --ploidy 1 because bacteria are haploid. This gives an empty VCF file like Aamiralizai's file.

bcftools mpileup -Ou -f reference.fasta ERR854867.bam | bcftools call -vm -Oz --ploidy 1 -o ERR854867.vcf.gz

4b) This also gives an empty BCF file.

bcftools mpileup -Ob -o ERR854867.mpileuponly.bcf -f reference.fasta ERR854867.bam

4c) Using two bam files gives a BCF file containing a line for every 1 bp in the reference genome... I don't know if the BCF is correct.

bcftools mpileup -Ob -o twosamples.bcf -f reference.fasta ERR854867.bam ERR854869.bam
ADD REPLY
1
Entering edit mode

And... I think I found the answer. I was using bcftools view -h foo.bcf instead of bcftools view foo.bcf. This led me to think my file was empty, but in reality bcftools was showing me only the headers. This behavior is different than samtools view -h foo.bam; in that software, the '-h' flag includes headers that would otherwise be hidden.

ADD REPLY
2
Entering edit mode
4.3 years ago

This is a warning and not an error. It informs you that knowing the ploidy (the number of sets of chromosomes in a cell) is recommended to improve quality of the variant calls. The warning states that the tool will assume a diploid organism when the parameter is not set. See:

bcftools call --ploidy ?
ADD COMMENT
0
Entering edit mode

thanks @Istvan, very useful!

ADD REPLY

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6