Question

Problems in identifying heterozygous transcripts using de novo transcriptome assembly via Trinity (galaxy)

0

Entering edit mode

6.0 years ago

aih5 • 0

Hello, I have been annotating a genomic region that is heterozygous in our model animal. There are 10 genes I'm interested in and they all have 2 distinct alleles (sequences are not identical, 30-300 nucleotide differences over ~1000-1500 bases). We have assembled a transcriptome and aligned it to the genome in addition to the RNA-seq data to this genomic region for both haplotypes. The RNA-seq data is aligned strictly (no mismatches allowed) and all genes have complete read coverage. The de novo transcriptome assembly does not assemble transcripts for the separate alleles, and usually there is only one transcript generated and it is mapped to both alleles. There are cases when the transcript is identical to one allele, as well as cases when it is a chimera of the two alleles. I feel like the read depth is enough that trinity should catch the separate alleles when it is assembling transcript (from my understanding in what I have read). I can't find an option or a function to make trinity more stringent in how it assembles the possible transcripts. The reason I am trying to figure this out is so that we can assemble de novo transcriptomes for other samples that do not have a reference genome, but I am not confident at this point that trinity can assemble the distinct alleles appropriately even in cases when there are many differences. Does anyone have experience in running trinity to be more sensitive to additional allelic variants? Thank you for you help!

transcriptome assembly RNA-Seq • 1.2k views

ADD COMMENT • link updated 5.9 years ago by Biostar 20 • written 6.0 years ago by aih5 • 0