Question: VarScan.v2.4.3 generating incomplete or partial VCF file
0
gravatar for Sandeep
8 weeks ago by
Sandeep250
Manipal, India
Sandeep250 wrote:

Hello all, I am trying to call somatic variants using VarScan.v2.4.3 from merged bam files. Here is the sript used to run the same.

java -Xmx64g -jar VarScan.v2.4.3.jar somatic <(samtools mpileup -l AmpliSeqExome.hg38.bed --no-BAQ -f "$bwa_ref".fa Normal_merged.sorted.bam Tumor_merged.sorted.bam) varscan_vcf/variant --mpileup 1 --output-vcf

The run is executed without any errors and the out put is pasted below.

Min coverage:   8x for Normal, 6x for Tumor
Min reads2: 2
Min strands2:   1
Min var freq:   0.2
Min freq for hom:   0.75
Normal purity:  1.0
Tumor purity:   1.0
Min avg qual:   15
P-value thresh: 0.99
Somatic p-value:    0.05
Reading input from /dev/fd/63
Reading mpileup input...
23064925 positions in mpileup file
22855806 had sufficient coverage for comparison
22796450 were called Reference
0 were mixed SNP-indel calls and filtered
0 were removed by the strand filter
54812 were called Germline
560 were called LOH
3979 were called Somatic
5 were called Unknown
0 were called Variant

The generated vcf however has only variants till chr7 and no more. The total line counts in vcf files for both the generated snp and indel vcf files is not more than 36000 (~18k + ~18k) and the file size is just around 3 MB.

I have run the same on multiple files and every single run has produced similar output.

I have also checked the presence of other chromosomes in our bam file. Can anyone point to a reason there are no calls after chr7?

ADD COMMENTlink modified 8 weeks ago by 2nelly170 • written 8 weeks ago by Sandeep250
1
gravatar for 2nelly
8 weeks ago by
2nelly170
Geneva,Switzerland
2nelly170 wrote:

Hi Sandeep,

Can you please check if AmpliSeqExome.hg38.bed contains only regions from chr1 to chr7?

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by 2nelly170

Thanks for the reply. I just reconfirmed again by opening Ampliseq file. It contains all chromosomes. Infact, the log files has total number of variants called but the saved vcf files do not have them.

ADD REPLYlink written 8 weeks ago by Sandeep250

Would you mind to count the lines of the output vcf and see how many variants you got using

grep -v "#" file.vcf | wc -l

Sometimes if you try to open a vcf with with text editor or excel you missed lines since there is a delay in loading.

In your case I would also run separately or in pipeline (|) the mpileup and the varscan commands.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by 2nelly170

wc -l on the output vcf files shows approximately 18k each. That makes it a total of 36k polymorphisms called. The log file has more than 58k polymorphisms called. I have separately generated mpileup file and run the analysis. Same output. Currently, I am running the same on Varscan v.2.4.2 to check.

ADD REPLYlink written 8 weeks ago by Sandeep250

That is so strange! Does your mpileup contain positions after chromosome 7?

Can you try to remove 0 coverage from mpileup

samtools mpileup -B -q 1 -f ref.fa normal.bam tumor.bam 

##Remove 0 coverage from mpileup file###

awk -F"\t" '$4 > 0 && $7 > 0' normal-tumor.mpileup > normal-tumor_no_0_coverage.mpileup

or one line

samtools mpileup -B -q 1 -f reference.fasta normal.bam tumor.bam | awk -F"\t" '$4 > 0 && $7 > 0' | java -jar VarScan.jar .....

I remember when I was running VarScan for CNV calling I had to deal with issue of O coverage in mpileup despite using -q 1

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by 2nelly170

Seems like I figured out where the problem lies. I just did a grep on the mpileup file.

 grep 'chr1' normal-tumor.mpileup
chr1    68929   A   88  ..............,..............................................,,,,,,,,,,,,,,,,,,,,,,,,,,^!.  ><588=<7>?;948/>===<<;<=@<;=@?>=/<?<@<;<<<9=@</<<;><==9;<;:8<883866684717488
86577853/15/    73  ..,.........,,,,.................................,,,,,,,,,,,,,,,,,,,,,,,^!. 8>.98<48//9;3714:08/<5<8A@</;=8@;8/=<=99<5<<<6=:@88668858874424335073868/
chr1    68930   A   87  .............,,............................................,,,,,,,,,,,,,,,,,,,,,,,,,,,, 95/0187489463=68999778679679999;667816786389771795997825678=A:=8<=5=@=87@;9==:5<;=77
57; 69  ..,........,,,,..............................,,,,,,,,,,,,,,,,,,,,,,,,   664448.7258:=9;714684898877800983158269613848==;;=<:<=;;:=95:=7=9?<>5
chr1    68931   G   90  ..............,,..............................................,,,,,,,,,,,,,,,,,,,,,,,,,,,.  =<575<<:=<:;4;868<<<<<<;;<;;;<;<>/;;<=89<;;8==</<8<=<==<<7:;<<;A;<:;:4=7=97<
69==:5;<=717:/  74  ..,.........,,,,.................................,,,,,,,,,,,,,,,,,,,,,,,.,  :<5;:=9<7/9=3=:9//5/;:=8=<</<<<<77/<;889;<<<58=9;=<69=<:==;<;>65:=7=7===/5

grep 'chr7' normal-tumor.mpileup
chr7    193426  c   0   *   *   0   *   *
chr7    193427  c   0   *   *   0   *   *
chr7    193428  g   0   *   *   0   *   *
chr7    193429  c   0   *   *   0   *   *
chr7    193430  c   0   *   *   1   *   .

grep 'chr8' normal-tumor.mpileup

No output for any chromosome after chr7. The mpileup file is generated from a merged bam file (4 samples in each group). Any idea why is my mpileup file incomplete?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Sandeep250

would you mind to have a look at your bam?

https://gatkforums.broadinstitute.org/gatk/discussion/7571/errors-in-sam-bam-files-can-be-diagnosed-with-validatesamfile

or

samtools quickcheck
ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by 2nelly170

Thanks for the suggestion. I am currently re-sorting the merged bams. Will update this thread once done.

ADD REPLYlink written 8 weeks ago by Sandeep250

Update: quickcheck executed without errors. I tried resorting the bam and ran mpileup. I also tried running mpileup on another machine. All of the steps have generated only an incomplete pileup file. It is confirmed something is going wrong with mpileup step. I have posted the issue on github, let me wait for a suggestion.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Sandeep250

How did you merge bam files?

Maybe something is missing or is wrong in the header, like a read group value

ADD REPLYlink written 7 weeks ago by 2nelly170

I used samtools merge

samtools merge -@ 32 Normal_merged.bam Normal1.bam Normal2.bam Normal3.bam

My reference file also had non standard chromosomes in them.

SN:chr11_KI270721v1_random
SN:chr14_GL000009v2_random
SN:chr14_GL000225v1_random
SN:chr14_KI270722v1_random
SN:chr14_GL000194v1_random
SN:chr14_KI270723v1_random

i have now cleaned up the bam as well as the bam header for all non standard chromosomes. i am running mpileup now. Keeping my fingers crossed :)

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Sandeep250

Now this is frustrating. Even after cleaning the headers, I am having the same issue.

$ cut -f 1 sample_merged.sorted.mpileup | uniq
chr1
chr2
chr3
chr4
chr5
chr6
chr7

I am stumped. Not really sure how to proceed from here.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Sandeep250

Indeed!!!

Last think before realignment of fastq.

what is the line of last read aligned in chr 7?

And finally... Can you try to output a bam file containing reads after chromosome 7 and run mpileup?

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by 2nelly170
tail sample_merged.sorted.mpileup

on two different machines have given me identical output.

chr7    100894296   A   50  ,................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,  @47362967899964:7<>7>>9<>;?B9<?9?=?9<<=<:=55><4:7<  33  .........,,,,,,,,,,,,,,,,,.,,..,,   086474599;<;<;>=B<5<5<:8959?=89=8
chr7    100894297   G   52  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,^].  @5<5;5:;<<<4<=8;=;<=8<<9<<;?A9<@8>=<=><?<:<587<:879<    37  ............,,,,,,,,,,,,,.,,,,.,,..,,   7<;./8<:;=.=<><=;>?C?8</9.5887=>=;9=8
chr7    100894298   C   52  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,.    ;5<5;59>=<<8==8<=<997;975789;0794@98>9798571738:454>    36  ...........,,,,,,,,,,,,.,,,,.,,..,,^].  :=;.:<;;=8=7969998;7071./440<:8<9989
chr7    100894299   C   52  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,.    <08568698793994777<@<==<<>=?98==8==<9>><<9<:/9=48:89    37  ..........,,,,,,,,,,,,,.,,,,.,,..,,,.   7884765518<B=><><B:9<7835<:;9==61=9<5
chr7    100894300   A   53  ,...........+1G......,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,.^].  <6<<:<;<<<=3<>8<;7=@:A<<<<?;89=>8==<><8<<8;978=;8;8<5   37  ..........,,,,,,,,,,,,,.,,,,.,,..,,,.   :;<:8;;:9<:=8=<>:<;8<7875<:5<=>;8<7</
chr7    100894301   G   53  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..   94;<;>;=<>=3<=8<;79:6977698635994998?5389467;69<455>8   39  ............,,,,,,,,,,,,,.,,,,.,,..,,,. 9=<56:8?<:<<49098949749297487/?99<=748/
chr7    100894302   G   52  ,................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..    A17/5778899784781=@:=<<<>=<>>=><?<@9<9<=:;;6?>69;991    38  ............,,,,,,,,,,,,,.,,,,.,,..,,,  478.16217677:>9<>>:=/;=/=4:=:59>>61=:<
chr7    100894303   A   53  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..   <3;39<<=<<=3;<8<=7<<7<;=<<=;<8<><?<==;9<=9;;6;<<9;7>7   39  ............,,,,,,,,,,,,,.,,,,.,,..,,,. :;<56:55:::;9>8=<=5</;=/B77=:5=<>;8<:;/
chr7    100894304   G   50  ,.........A.......,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,..  73<5;<====96;<;<=<7674787534785888=7887374727=22><  35  ............,,,,,,,,,,.,,,.,,..,,,. <=<38;55;:;<59176891989850<77;7836/
chr7    100894305   G   53  ,.................,,,,,,,,,,,,,,,,,,.,,,,,,,.,,.,,,..   <3==;?<====6=@;<=<>;8<;=>==;:8<@>@=>=<>=<:?=69<@9:7>:   38  ............,,,,,,,,,,,,,.,,,,.,,..,,,  <@<3>;5;;::<<>;<<=7>:;?9C78=<9=<>B;>=;

Can you try to output a bam file containing reads after chromosome 7 and run mpileup?

Thats a good suggestion. I will try to run the same.

ADD REPLYlink written 7 weeks ago by Sandeep250

Actually I was talking about the last reads of chr7 in sorted bam file.

Are those the last lines of chr7 in mpileup?

The file stops too early in chr7 (chr7 100894305).

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by 2nelly170

The last cordinate on chr7 in my bam is

0IN8L:03162:11340   16  chr7    159335889   0   34M *   0   0
ADD REPLYlink written 7 weeks ago by Sandeep250

I think I figured out the problem. I went back to bed file to inspect it near the region samtools exits. There seems to be a blank line separating two entries.

chr7    100893945   100894175   ACHE_80133.3.2392   0   +   .   GENE_ID=ACHE;Pool=10;TRIM_LEFT=0;TRIM_RIGHT=0
chr7    100894084   100894305   ACHE_80133.3.7298   0   +   .   GENE_ID=ACHE;Pool=11;TRIM_LEFT=0;TRIM_RIGHT=48

chr7    100990480   100990689   MUC12_80135.1.27552 0   +   .   GENE_ID=MUC12;Pool=1;TRIM_LEFT=1001;TRIM_RIGHT=0
chr7    100990677   100990894   MUC12_80135.1.26417 0   +   .   GENE_ID=MUC12;Pool=2;TRIM_LEFT=1001;TRIM_RIGHT=0
chr7    100990882   100991081   MUC12_80135.1.5617  0   +   .   GENE_ID=MUC12;Pool=3;TRIM_LEFT=1001;TRIM_RIGHT=0

I also found three more blank lines in the bed file.

ADD REPLYlink written 7 weeks ago by Sandeep250

Yes, this makes it clear.

It is exactly the coordinate that mpileup stops.

Finally!!!

ADD REPLYlink written 7 weeks ago by 2nelly170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1137 users visited in the last hour