Entering edit mode
21 months ago
dilokef367
•
0
Hi all,
I've managed to call variants from RNAseq using existing tools out there (Strelka2, HC). Now I want to compare my variants with subsets of known variants coming from specific datasets.
Here's the question: variants are reported only on the forward strand. So, if a variant occurs in a gene located on the reverse strand, did I need to change the variant itself to match the reported one?
For example: BRAF V600E is reported as T>A on Clinvar. However, in my data, I see the variation as A>T, and this makes sense with the fact that BRAF is on the reverse strand. So, the question is:
- bcftools accounts for that while doing the intersect?
- If not, before reinventing the wheel, is there something out there which handles that and permits to do intersections between variants coming from RNAseq (indeed with strand info) and known databases (as VCF)?
Am I wrong ?: as long as you can match CHROM:POS I don't think the strand will change anything.
Yeah but what if I'm calling the wrong allele? Don't I need to match also the ALT allele with the db?
Your concern is valid. bcftools has options to match CHROM & POS, CHROM+POS+REF and CHROM+POS+REF+ALT. You either need to ensure your variants are mapped to the ref genome and not the transcriptome or control the mapping criterion based on how each database is built. If ClinVar has two entries for two different ALT alleles, you'll need to compare ALT as well. If not, just compare CHROM and POS.
The default is CHROM+POS+REF+ALT ? I'm looking here https://samtools.github.io/bcftools/bcftools.html#isec with no luck
I'm not sure that the default for the tool is - I always specify
-c none
explicitly because I want CHROM+POS+REF+ALT level uniques.Moving to vcfanno, which is more flexible. Thanks for your help.