Varscan2 Strandness Filter
1
0
Entering edit mode
11.2 years ago
GPR ▴ 390

Hello, I have used VarScan.v2.3.2 to identify RNA-editing sites in my data. I first ran VarScan mpileup2snp using min. average quality set to 25, minimum coverage 8, minimum variant reads 2, min. var. freq. 0.1, strand filter 1 and pValue 0.01. After this I ran VarScan filter with min coverage 10, minimum variant reads 3, min. strands 1 and min. average quality 25. I pretty much ran VarScan filter in order to account for "strandness" issues, since I could have used stringent parameters in mpileup2snp.

I have compiled those A-to-G (second strand transcripts) and T-to-C (first strand variants) and am inspecting them individually in IGV. I am puzzled by the fact that not all of the A-to-G editing sites were called in second strand transcripts, nor are all of the T-to-C variants in first strand transcripts. Can somebody comment on what might be the source of this discrepancy? Will this be taken care if I set the min.strands filter to 2? I definitely don't have strand-specific reads (these are on their way) and have aligned my reads with BWA. I have these reads aligned with TopHat2, using a reference genes.gtf file, and am now preparing the inputs for VarScan. Will this solve the problem?

Thanks, G

strand • 5.4k views
ADD COMMENT
0
Entering edit mode

I'm a little confused about what you are doing. So, to start with a simple question, are you aligning to the genome or to a fasta file of transcripts?

ADD REPLY
0
Entering edit mode

I aligning to the hg19.fa genome. using BWA. I had previously performed an alignment to hg19.fa with TopHat2, using a transcripts reference file, to aid in proper downstream transcript reconstruction by Cufflinks. In the current absence of a strand-specific data set, I am trying to find out if these "strand" discrepancies might be alleviated by using the TopHat2 alignments to call variants with the Samtools-VarScan scheme.

ADD REPLY
0
Entering edit mode
11.2 years ago

First point: use an RNA-seq aware aligner such as Tophat, gsnap, STAR, mapsplice, etc.

Second point: All the variants reported by all variant-callers that I know of report the variants on the forward strand relative to the reference. The variant on the reverse strand is simply the reverse complement. If you want to define variants in RNA-seq in the orientation of the actual transcript, you will need to reverse-complement variants for all transcripts that are on the negative strand. So, a T-to-C site as reported by Varscan (regardless of the aligner) is going to be a T-to-C site on a forward-strand gene and the same site will be an A-to-G for a reverse-strand gene. In other words, you will need to post-process the variants as reported to get potential editing sites. As an aside, the strand filter in VarScan has nothing to do with whether the final called variant is on the forward or reverse strand; instead, it has to do with the presence of the variant in reads that align on only 1 strand or on both. Variants that are present in reads on only aligning to one strand are more likely to be false positives.

Third point: Unless you have DNA variants, these "editing" sites are just variants. There are plenty of A-to-G variants that are not editing sites, so you'll need to sequence the DNA in the same samples to determine whether or not editing might be happening. Though you don't mention it in your question, I assume you have an approach for doing that.

ADD COMMENT
0
Entering edit mode

Thanks again for your input Sean.

To narrow down true putative A-to-G variants, I have removed all variant calls that appear in a dbSNP.vcf. I of course plan to validate a handfull of sites by sequencing genomic DNA and RNA-derived cDNA.

For some reason I have had a hard time understanding the meaning strand filter in VarScan, Base on your comments in point 3, I wil now reanalyze my data with VarScan filter set --min-strands2 at 2. I guess my confusion stemed from the fact that ADARs modify a dsRNA moiety on Adenine in one strand.

ADD REPLY
0
Entering edit mode

Unfortunately, I'm not sure that filtering with dbSNP and validating a handful of sites is likely to be adequate to generate a set of high-quality RNA-edited sites, but others on this forum or with more knowledge than I might be able to weigh in on that point.

ADD REPLY

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6