Inherited pipeline uses RNA-seq variant calls for WASP filtering—shouldn’t it be external SNPs?
1
0
Entering edit mode
10 days ago

Hi!

I had a look at a predecessor PhD script for ASEReadCounter using WASP for unbiased mapping and I don't think it makes any sense.

Below is the worflow followed in his script:d

enter image description here

More specifically, in this step below, the script uses the called variants from the own data.

enter image description here

using this flag:

enter image description here

I thought the snps should be from an external source such as 1000 Genomes not the own called variants?

Am i missing something here or what?

Help pls:)

/Jonas

WASP • 529 views
ADD COMMENT
0
Entering edit mode
7 days ago

Looking through the WASP documentation, and paper, its a little unclear. Clearly what is expected is a set of "known" SNPs. However, I'm not 100% sure if this is expected to be from the genomic sequence of the same individual that produced the RNAseq data, or a standard resource, such as 1000G. To me, the test described in the paper makes most sense if it is using the genotype of the individual themselves. I wonder if the predecessor decided to use the RNAseq SNPs in the absence of DNA seq from this individual?

ADD COMMENT
0
Entering edit mode

Thank you so much for you answer i.sudbery! Yes I have read it too and I don't think it's totally clear either, so I guess I might not be retard after all:) No he was not using DNA seq, only RNA seq. I now got this project in my knee:)

But if I use the variants called from my own data, doesn't that mean that the bias is allready introduced in the STAR alignReads step? So I'm running wasp with an allready biased reference? Does that sound reasonable? what do you think?

ADD REPLY
0
Entering edit mode

Its always unclear what to do for ASE/eQTL when you don't have matching DNAseq. Infact, you ideally want matched, phased, haplotypes!

I think using, say, 1000G SNPs is likely to be conservative, and therefore safe. I'm not sure if you should provide them as phased haplotypes or just as SNPs - if your samples match the known haplotypes, then my feeling is that this will be advantageous. However, it might cause issues where your samples don't match the common haplotypes.

My worry with this approach is that you will end up discarding reads unnecessarily: WASP takes reads that overlap a "known" SNP and generates all possible haplotypes other than the one seen in the read, and tests if they maps elsewhere. If they do, they are discarded. I think what might happen then is you might do this for lots of SNPs that your sample doesn't actually carry. Which might lead to discarding too many reads.

One the other hand, if you use RNA-seq variant calling, you will be limited to only those variants that in actaully in your sample. You will also find variants that are in your sample, but not in, say 1000G. The general worry with reference bias is that you under call variants, because reads with variants are less likely to map to reference. Thus, you are unlikely to generate false positive variants due to reference bias. You might under quantify variants, which might lead to false ASE, this is what WASP corrects for, but you are unlikely to call False positive variants due to reference bias.

However you may call false positives due to other problems with calling variants from RNAseq (such as RNA editing, or base modification).

One solution might be to take the intersection of something like 1000G and the RNAseq variants, and therefore use SNPs you are pretty confident are real, but only the ones you have some evidence are present in your sample. However, this has the disadvantage or being the least powerful/most conservative of all the options.

ADD REPLY

Login before adding your answer.

Traffic: 2254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6