Hi - I am obtaining from a RNA-seq bam file the BCF output from samtools mpileup in order to corroborate variants seen in the exome.
Thus, for corroborating one single SNP, my command looks like this:
samtools mpileup -ug -t DP -t DV -t DP4 --min-MQ 40 --min-BQ 35 -f GRCh37.fa --region "22:29907105-29907105" RNAsample.bam | bcftools view --min-alleles 3 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT RNAsample 22 29907105 . G A,<X> 0 . DP=47;I16=14,12,6,7,974,36674,484,18098,520,10400,260,5200,488,11274,261,5927;QS=0.666667,0.333333,0;VDB=0.280567;SGB=-0.683931;RPB=0.814533;MQB=1;MQSB=1;BQB=0.956592;MQ0F=0 PL:DP:DV:DP4 82,0,143,160,182,243:39:13:14,12,6,7
Because, samtools is always outputting "an non-ref base 'X' represents an base has not been seen from the alignment data" I need to filter for minimum 3 alleles (one ref, one alt and the X). I find this all counter-intuitive and would like to omit the non-ref base 'X' in the output from samtools mpileup as in this case it will not be needed for anything.
Does anybody know how to do this properly (excluding sed, awk and the like ;) )
The pileup itself for this position looks as follows:
$ samtools mpileup -t DP -t DV -t DP4 --min-MQ 40 --min-BQ 35 -f /home/mpschr/bin/bcbionextgen/data/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --region "22:29907105-29907105" RNAsample.bam 22 29907105 G 39 .,...AAA..A,,a,a..a,a,,A..aA,,a,.,..a,. DDDDDDDDDDDDJFJJEGHIIJJIJJEJDEDDJDJHDDF