Question

Varscan2 Strandness Filter

0

Entering edit mode

11.2 years ago

GPR ▴ 390

Hello, I have used VarScan.v2.3.2 to identify RNA-editing sites in my data. I first ran VarScan mpileup2snp using min. average quality set to 25, minimum coverage 8, minimum variant reads 2, min. var. freq. 0.1, strand filter 1 and pValue 0.01. After this I ran VarScan filter with min coverage 10, minimum variant reads 3, min. strands 1 and min. average quality 25. I pretty much ran VarScan filter in order to account for "strandness" issues, since I could have used stringent parameters in mpileup2snp.

I have compiled those A-to-G (second strand transcripts) and T-to-C (first strand variants) and am inspecting them individually in IGV. I am puzzled by the fact that not all of the A-to-G editing sites were called in second strand transcripts, nor are all of the T-to-C variants in first strand transcripts. Can somebody comment on what might be the source of this discrepancy? Will this be taken care if I set the min.strands filter to 2? I definitely don't have strand-specific reads (these are on their way) and have aligned my reads with BWA. I have these reads aligned with TopHat2, using a reference genes.gtf file, and am now preparing the inputs for VarScan. Will this solve the problem?

Thanks, G

strand • 5.4k views

ADD COMMENT • link updated 11.2 years ago by Sean Davis 26k • written 11.2 years ago by GPR ▴ 390

0

Entering edit mode

I'm a little confused about what you are doing. So, to start with a simple question, are you aligning to the genome or to a fasta file of transcripts?

ADD REPLY • link 11.2 years ago by Sean Davis 26k

0

Entering edit mode

I aligning to the hg19.fa genome. using BWA. I had previously performed an alignment to hg19.fa with TopHat2, using a transcripts reference file, to aid in proper downstream transcript reconstruction by Cufflinks. In the current absence of a strand-specific data set, I am trying to find out if these "strand" discrepancies might be alleviated by using the TopHat2 alignments to call variants with the Samtools-VarScan scheme.

ADD REPLY • link 11.2 years ago by GPR ▴ 390

score 0 · Answer 1 · 2013-02-03

First point: use an RNA-seq aware aligner such as Tophat, gsnap, STAR, mapsplice, etc.

Second point: All the variants reported by all variant-callers that I know of report the variants on the forward strand relative to the reference. The variant on the reverse strand is simply the reverse complement. If you want to define variants in RNA-seq in the orientation of the actual transcript, you will need to reverse-complement variants for all transcripts that are on the negative strand. So, a T-to-C site as reported by Varscan (regardless of the aligner) is going to be a T-to-C site on a forward-strand gene and the same site will be an A-to-G for a reverse-strand gene. In other words, you will need to post-process the variants as reported to get potential editing sites. As an aside, the strand filter in VarScan has nothing to do with whether the final called variant is on the forward or reverse strand; instead, it has to do with the presence of the variant in reads that align on only 1 strand or on both. Variants that are present in reads on only aligning to one strand are more likely to be false positives.

Third point: Unless you have DNA variants, these "editing" sites are just variants. There are plenty of A-to-G variants that are not editing sites, so you'll need to sequence the DNA in the same samples to determine whether or not editing might be happening. Though you don't mention it in your question, I assume you have an approach for doing that.