Entering edit mode
9 months ago
Yumin
▴
20
Hi!
I'm using vg rna
for transcriptomic analyses following the wiki. I will list each of my steps and confusion below:
- First, I started with the final pangenome graph .gfa file obtained by the cactus-minigraph pipeline, which is in GFA 1.1 format. I converted it into GFA 1.0 format using gfa-wp:
gfa-wp joint-full.gfa > joint-full.cov.gfa
The first two columns of the P line in joint-full.cov.gfa
looks like:
P COR_A#0#Chr10:0-69032057
P COR_B#0#Chr10:107698-185749
P COR_B#0#Chr10:198532-281912
P COR_B#0#Chr10:291916-528385
P COR_B#0#Chr10:759772-790564
- Then I used
joint-full.cov.gfa
for the GBWT construction:vg convert -p -t 40 joint-full.cov.gfa > joint-full.cov.pg vg gbwt -p -d ./ -o joint-full.cov.gbwt \ --path-regex "(.*)#(.*)#(.*):(.*)" --path-fields _SHCF \ -G joint-full.cov.gfa
- Finally, I used
vg rna
for Pantranscriptomes Construction:
The following error occurred:vg rna -p -t 40 -s ID -j -k 32 -u \ -n gff/COR_A.gff -n gff/COR_B.gff \ -n gff/COR_C.gff -n gff/COR_D.gff \ -l joint-full.gbwt -b joint-full.rna.gbwt -f joint-full.rna.fa \ -i joint-full.rna.txt joint-full.pg > joint-full.rna.pg
[vg rna] Parsing graph file ... [vg rna] Parsing haplotype GBWT index file ... [vg rna] Graph and GBWT index parsed in 25.7318 seconds, 18.4524 GB [vg rna] Adding transcript splice-junctions and exon boundaries to graph ... [transcriptome] ERROR: Chromomsome path "COR_A#0#Chr10" not found in graph or haplotypes index (line 1).
So, my question is:
- How do my steps need to be improved to successfully build a pan-transcriptome index using
vg rna
based on thecactus-minigraph
gfa file. - How to solve the chromosome path mismatch in the error message, how do I know what the chromosome path in the gbwt index looks like?
Thank you for sharing your experience. I used
vg gbwt -M -C -H -S -L -T joint-full.cov.gbwt
to view the information in the gbwt index, which is as follows:The names of
thread
shown here are completely different from those in thegfa
P line path, which confuses me as to how to make the chromosome paths match. Even if I modify the chromosome name of thegff
file according to the abovethread
name, it still doesn't work.