Question

Physical phasing of polyploid SNPs from RNA-seq data

0

Entering edit mode

3.4 years ago

jfaberha ▴ 50

Hi, I'm working with transcriptomic data from a diploid species with an incomplete reference genome (likely 80% of genome represented in reference). I aligned paired-end reads with STAR then generated counts with StringTie. After running DE analyses, most of the results make sense and look good, although we found a surprisingly high number of DE genes between genetic lineages.

After visualizing SNPs/indels from the alignment files in IGV, it looks like a number of genes have alignments that indicate more than two haplotypes for a single locus, indicating potential misalignment of transcripts from lineage-specific CNVs or paralogs/pseudogenes missing from the reference genome.

Now, I would like to quantify these misalignments in a more systematic way and my thought was to extract physical phasing information per sample to see if we can detect more than two overlapping haplotypes at each locus, which would then be flagged as a potential misalignment. So far, I've tried the GATK haplotype caller pipeline for RNA-seq data (barcwiki.wi.mit.edu/wiki/SOP/CallingVariantsRNAseq), but the problem I'm running into is that phasing information isn't generated in vcf output files when ploidy is set higher than two, and running haplotype caller on samples set as diploid doesn't tell us if we have extra unexpected alleles.

Can anyone suggest another tool that can output phasing information for more than two haplotypes, or just suggest an entirely different strategy to address my original problem quantifying possible misalignments of RNA-seq data? Thanks.

GATK phasing transcriptome haplotype polyploid • 843 views

ADD COMMENT • link updated 2.4 years ago by colindaven 6.9k • written 3.4 years ago by jfaberha ▴ 50

score 1 · Answer 1 · 2022-06-10

1

Entering edit mode

2.4 years ago

colindaven 6.9k

Late, but I think this is one of the few tools which can do polyploid phasing:

https://github.com/bluenote-1577/flopp

Another is perhaps nphase

https://github.com/OmarOakheart/nPhase

ADD COMMENT • link 2.4 years ago by colindaven 6.9k