I am new to building a pan-transcriptome utilizing assemblies produced by the Human Pangenome Reference Consortium from year 1 data, 47 samples. I planned to use all of the Fasta files for my reference genomes...
I then planned to add vcf files from TCGA from 422 cancer patients, to account for structural variants of molecular subtypes I would like explore, and lastly, I wanted to include 30 samples (of another dataset) of RNA sequencing data, using VG.
I wanted to understand if this was possible from a memory and functionality standpoint? I am new to using VG and have not found literature that explicitly explores this that I can understand. I was going to use Minigraph instead, however, I did not see any way to include rna sequencing (which is important for my project).
If you have any references/links, please feel free to include so that I can read more into them.
i am not sure how about the data input format but 'hisat2' is able to 'align rna-seq to a population reference'. this may be a little different than aligning directly to the human pangenome graph, but may be a point of reference. can also come up with various links by searching the vg repository, one link i found here https://github.com/vgteam/vg/wiki/Transcriptomic-analyses I have not done a lot of work with graph genomes but i'd say that it is likely a challenging endeavor compared with reference based, but could be interesting :)
Thank you! I found this as well and trying to understand it now.
vg mpmapsubcommand has features for mapping RNA-seq data to a graph. We describe in more detail in this publication, including a comparison to some other tools you might consider. cmdcolin is correct that
HISAT2is also capable of aligning to a graph that is constructed from a VCF.
If you plan to use VCF data,
minigraphisn't really an appropriate tool. It's designed for building a graph from multiple genome assemblies. For VCF input,
vgboth have internal graph construction algorithms. I can't speak much for
HISAT2, but the easiest entry point for
vg's graph construction (for most people) is the
Thank you so much! This is where I am leading back to as well! I don't have much insights into Hisat2. I am going to give it a go this week and will comment when I figure it out,.