Varscan2 copynumber 'Parsing Exception' error
1
0
Entering edit mode
4.7 years ago
Vanish007 ▴ 10

Hello Everyone,

I'm new to the Whole Exome Sequencing analytics and keep running into the following issue -

I'm running Varscan2.4.2 and invoking the copynumber command on an mpileup file and I keep receiving a "parsing exception error":

[mpileup] 2 samples in 2 input files
<mpileup> Set max per-file depth to 4000
Min coverage:   10
Min avg qual:   15
P-value thresh: 0.01
Reading input from Normal.sort.rmdup_Tumor.sort.rmdup.mpileup
Parsing Exception on line:
chr1    12956   T   0   *   *   0
8


A little info on the pre-processing: I have downloaded Bam files from the TCGA, used Samtools to sort the Bam files, used Picard tools to mark duplicates, samtools to build the mpileup and now running varscan2 for the copy number output.

Anyone have any idea on what could be causing the Parsing Exception error and how I might possibly circumvent this? I've done a little googling and saw that in other cases people have tried removing 0 coverage lines by piping the mpileup through the awk command, but I thought this issue was addressed in the later versions of Varscan.

Thank you kindly for reading!

In case more info is needed, Here are the commands and options I am running:

#Sort
samtools sort -l 0 -O bam -T Normal.sort -o Normal.sort.bam Normal.bam
samtools sort -l 0 -O bam -T Tumor.sort -o Normal.sort.bam Tumor.bam

#MarkDuplicates
java -jar Picard.jar MarkDuplicates I=Normal.sort.bam O=Normal.sort.mkdp.bam METRICS_FILE=dup1 TMP_DIR=tmp1
java -jar Picard.jar MarkDuplicates I=Tumor.sort.bam O=Tumor.sort.mkdp.bam METRICS_FILE=dup2 TMP_DIR=tmp2

#Mpileup
samtools mpileup -q 1 -B -f GRCh38.d1.vd1.fa Normal.sort.mkdp.bam Tumor.sort.mkdp.bam > Normal.sort.mkdp_Tumor.sort.mkdp.mpileup

#Varscan
java -jar VarScan.v2.4.2.jar copynumber Normal.sort.mkdp_Tumor.sort.mkdp.mpileup Normal.sort.mkdp_Tumor.sort.out --mpileup 1 --data-ratio 0.8916044227

varscan2 samtools bam copynumber whole exome • 2.6k views
1
Entering edit mode

How does that line look different from the others? For example use the following on your file grep -w -A 5 -B 5 '12956' yourfile.txt with -w (only searching for whole occurences of the string, 129568 is not a match), with -A (printing n lines after the match), with -B (printing n lines before the match).

0
Entering edit mode

Thank you for the fast reply! After running what you stated with the grep command, chr1 gave me this output with lines above and below 12956 showing a value of "1" (Does that mean that chr4 would also give me an issue?):

chr1 12951 A 0 * * 1 . ? chr1 12952 T 0 * * 1 . > chr1 12953 G 0 * * 1 . C chr1 12954 G 0 * * 1 . A chr1 12955 G 0 * * 1 . A chr1 12956 T 0 * * 0
chr1 12957 C 0 * * 1 . B chr1 12958 A 0 * * 1 . @ chr1 12959 T 0 * * 1 . = chr1 12960 C 0 * * 1 . @ chr1 12961 C 0 * * 1 . @

Does that mean chr 4 would give me even more issues or are the two unrelated issues? chr4 12951 C 1 , F 0 * * chr4 12952 T 1 , @ 0 * * chr4 12953 A 1 , @ 0 * * chr4 12954 C 1 , E 0 * * chr4 12955 A 1 , @ 0 * * chr4 12956 C 1 , F 0 * * chr4 12957 T 1 , @ 0 * * chr4 12958 A 1 , @ 0 * * chr4 12959 C 1 , E 0 * * chr4 12960 T 1 , B 0 * * chr4 12961 C 1 , E 0 * *

Thank you very much! (Sorry I don't know how to properly format here just yet so I'll just post as an image)

0
Entering edit mode

That image is apparently a good way of sharing your result ;) Another step to assist in troubleshooting would be to remove the offending line using grep -w -v 12956 yourfile.txt > outfile.tsv with -v (for inverse, keep everything that doesn't match). Off course that's not desired to remove positions, but it might help to nail down the problem. If's its a larger problem you'll see plenty of other positions with errors, if it's just this... then it's a weird position worth investigating.

I'm not familiar with Varscan so I looked for a manual and found this: http://dkoboldt.github.io/varscan/copy-number-calling.html I can't find where it specifies you should use -B in the mpileup command. Maybe you are following another resource?

0
Entering edit mode

Thank you very much for your input, I'll definitely try it out!

The "-B" command was something I tried after looking through posts on seqanswers (Posted by "dkoboldt"):

"Hello, and thanks for posting this issue. Yes, VarScan does not expect to see a line with coverage=0 in a single-sample pileup file. VarScan v2.3.6 addresses this issue and should not crash. In either case, I recommend using two-sample mpileups for normal/tumor comparisons (somatic and copynumber), but doing so with the -B parameter (in samtools mpileup) to disable BAQ computation."

The -B command is a samtools option listed under "mpileup" on the samtools manual (http://www.htslib.org/doc/samtools.html). I figured, why not let's give it a try since I've been at my wit's end trying to get Varscan to work wit these samples =)

0
Entering edit mode

So I've tried just about everything and no success and I'm about to give up on ever graduating.

I've even tried cleaning my bam files with java -jar Picard.jar FixMateInformation I=Tumor.sort.mkdup.bam SO=coordinate

and that has still resulted in a "Parsing Exception Error".

I've posted this problem on the Varscan help forums but got no support. I even E-mailed Dan Koboldt, creator of the Varscan program, to no avail. My apologies if I seem to have lost my patience here, but it's a problem I've been struggling with since April to no avail and it's the one thing that's keeping me from graduating.

Would removing "chrMT" and "chrUnknown" from my bam files help solve this issue? If so, is there a proper way to remove those from my bam files? Would I also have to remove them from my reference fasta?

Thanks for your patience.

0
Entering edit mode
2.5 years ago
jgarces ▴ 20

Hi there,

I'm having the same issue with VarScan and I'm wondering if there's any option not to deactivate the BAQ option (-B) in mpileup (to calculate base alignment quality) and pipe it to VarScan somatic... if not, how much information about alignment is lost and it'd be relevant?