I performed targeted RNA-sequencing, in which we used a probe pool originally designed for exome capture to capture a transcript of interest in RNA-seq libraries. We got 1,000-fold enrichment of the transcript, and did the experiment on 3 cell lines and 8 brain samples. All 11 samples are heterozygous for a set of ~100 completely linked variants (haplotype) at our locus of interest, 11 of which overlap exons, and thus were captured by our probes. We excluded probes that overlapped these SNPs, as well as other moderately linked SNPs, to reduce capture bias.
We did standard QC, aligned reads with STAR, reduced mapping bias with WASP, and used HaplotypeCaller and ASEReadcounter to generate allele counts for the reference and alternate alleles of all 11 exonic SNPs. We did the analysis both by removing all duplicates and by only removing optical duplicates, which gave the same results:
1) Our positive control cell lines (n=2 of the same cell type) show ~20% increased reads coming from the major (reference) haplotype, as determined by averaging the allelic ratio across 9/11 SNPs (two SNPs were removed due to being close to a 5bp indel and showing extreme bias)
2) Our third cell line (n=1 of a different cell type) shows no notable allele-specific expression differences using the 9 SNPs.
3) Our 8 brains show varied effects with the 9 SNPs, with some showing ~5% bias in the reference allele direction, and others showing bias in the alternate allele direction.
What we don't understand is that we are seeing huge variation in allelic ratios WITHIN samples but BETWEEN SNPs, even though these SNPs are in complete LD and thus almost certainly phased correctly (we confirmed correct phasing between some of the SNPs by looking at alignment or alignment pairs containing more than one SNP), and they are all contained within the predominant transcript variant, and in constitutive exons.
Most troubling is the fact that in the brain samples which show no clear pattern of allele-specific expression, many SNPs show ALTERNATE allele biases, which to my knowledge, shouldn't happen due to any source of technical bias. We know from other studies that the alternate allele should be expressed either at equal or lower levels, so the only type of bias we should be seeing is reference allele bias. In addition, the SNPs within a sample don't consistently agree on direction or magnitude of allelic bias.
Finally, important to point out that we typically have thousands of reads per SNP per sample, so that shouldn't be an issue, and we sequenced all 11 libraries on one lane of a HiSeq 2500 with 125bp paired end reads (~300 million total read pairs).