Question: ASEReadCounter More then one variant context at position
gravatar for mb2subi
3 days ago by
mb2subi0 wrote:

Hi everyone,

I am doing an allelic imbalance of ATAC-seq data using a vcf generated with Strelka2. I used the ASEReadCounter of GATK:

java -jar $GenomeAnalysisTK \
    -R $reference \
    -T ASEReadCounter \
    -o $out \
    -I $bam \
    -sites $vcf

But this error arosed:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: More then one variant context at position: chr1:1056601
##### ERROR ------------------------------------------------------------------------------------------

Then I applied the SelectVariants function of GATK:

java -jar ${GenomeAnalysisTK} \
    -T SelectVariants \
    -R ${reference} \
    -V ${vcf} \
    -o ${vcfNonExt}_Selected.vcf \
    -sn ${sample}
    -selectType SNP \
    -restrictAllelesTo BIALLELIC

And I used the vcf resulting from this SelectVariants function, how ever I obtained the same error.

Any advice?

ADD COMMENTlink modified 2 days ago by Charles Warden6.4k • written 3 days ago by mb2subi0

what is the output of

 grep  -P 'chr1\t1056601' the.vcf | cut -f1-5
ADD REPLYlink written 3 days ago by Pierre Lindenbaum117k
chr1    1056601 .   A   G
chr1    1056601 .   A   ACGTGTGGGTGCGTGTG
chr1    105660121   .   C   T
ADD REPLYlink modified 5 hours ago by Pierre Lindenbaum117k • written 6 hours ago by mb2subi0

GATK doesn't like when there are two variants with the same CHROM/POS/REF

More then one variant context at position: chr1:1056601

ADD REPLYlink written 5 hours ago by Pierre Lindenbaum117k

But theoretically with SelectVariants I should have corrected this "error", right?

If not, how I can avoid this positions with more than one variant?

ADD REPLYlink written 5 hours ago by mb2subi0
gravatar for Charles Warden
2 days ago by
Charles Warden6.4k
Duarte, CA
Charles Warden6.4k wrote:

I'm not sure what extra complications you might encounter with ATAC-Seq data versus RNA-Seq data. However, I have the following suggestions:

1) Last time I checked, I thought the GATK ASEReadCounter only worked for SNPs. If you are already throwing out all indels, then only focusing on unambiguous SNP sites may be less of a big deal.

2) You can try some other allele counting programs, such as allelecounter and PhASER. Again, last time I checked, allelecounter covers some indels without phasing, and PhASER does phasing but doesn't include indels (and I can't really vouch for the extra haplotype files, even though they should theoretically be a possible advantage to that program).

For #2, I also don't know about complications for ATAC-Seq. If you have a Bowtie2 alignment, I think you may need to use the --local parameter to avoid having a substantially lower alignment rate. However, if you want to keep more reads for mutation calling, I think you probably should do some trimming (GATK has options to do things like remove soft-clipped bases, but I might be more worried about the specific alignment positions with the ATAC-Seq alignment...however, that is just a guess, and I don't really have a comparison to back up that claim).

ADD COMMENTlink modified 2 days ago • written 2 days ago by Charles Warden6.4k

Theoretically the vcf is of SNPs only.

I align using bwa -mem but I can try with bowtie2 --local

I used allelecounter but the follow advice arose:

Traceback (most recent call last):
  File "/imppc/labs/lplab/share/marc/repos/allelecounter/", line 177, in <module>
  File "/imppc/labs/lplab/share/marc/repos/allelecounter/", line 110, in main
    block = int(block_str) - 1
ValueError: invalid literal for int() with base 10: ''

I will try with PhASER if not.

ADD REPLYlink written 5 hours ago by mb2subi0

I'm not entirely sure about the cause of that error message, but it is admittedly usually best to work with the data type from which an algorithm was developed (I believe I've tested those allele counters with TopHat2 and STAR, but not with BWA-MEM or Bowtie2). So, I think at least one of those would typically work without errors for a given RNA-Seq alignment, but I think that may have actually varied between projects (which one could work and/or provide the results that seemed most appropriate).

The Bowtie2 "local" alignment was to get an alignment rate more similar to BWA-MEM, but I think there are still some formatting differences. For example, I think you may find different insert pairing/distributions with Picard (assuming you have paired-end data) if you use BWA-MEM versus Bowtie2.

ADD REPLYlink written 3 hours ago by Charles Warden6.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1803 users visited in the last hour