vg rna index
1
2
Entering edit mode
16 months ago
Yumin ▴ 20

Hi! I'm using vg rna for transcriptomic analyses following the wiki. I will list each of my steps and confusion below:

  • First, I started with the final pangenome graph .gfa file obtained by the cactus-minigraph pipeline, which is in GFA 1.1 format. I converted it into GFA 1.0 format using gfa-wp: gfa-wp joint-full.gfa > joint-full.cov.gfa

The first two columns of the P line in joint-full.cov.gfa looks like:

P       COR_A#0#Chr10:0-69032057
P       COR_B#0#Chr10:107698-185749
P       COR_B#0#Chr10:198532-281912
P       COR_B#0#Chr10:291916-528385
P       COR_B#0#Chr10:759772-790564
  • Then I used joint-full.cov.gfa for the GBWT construction:
    vg convert -p -t 40 joint-full.cov.gfa > joint-full.cov.pg
    vg gbwt -p -d ./ -o joint-full.cov.gbwt \
     --path-regex "(.*)#(.*)#(.*):(.*)" --path-fields _SHCF \
     -G joint-full.cov.gfa
    
  • Finally, I used vg rna for Pantranscriptomes Construction:
    vg rna -p -t 40 -s ID -j -k 32 -u \
     -n gff/COR_A.gff -n gff/COR_B.gff \
     -n gff/COR_C.gff -n gff/COR_D.gff \
     -l joint-full.gbwt -b joint-full.rna.gbwt -f joint-full.rna.fa \
     -i joint-full.rna.txt joint-full.pg > joint-full.rna.pg
    
    The following error occurred:
    [vg rna] Parsing graph file ...
    [vg rna] Parsing haplotype GBWT index file ...
    [vg rna] Graph and GBWT index parsed in 25.7318 seconds, 18.4524 GB
    [vg rna] Adding transcript splice-junctions and exon boundaries to graph ...
    [transcriptome] ERROR: Chromomsome path "COR_A#0#Chr10" not found in graph or haplotypes index (line 1).
    

So, my question is:

  • How do my steps need to be improved to successfully build a pan-transcriptome index using vg rna based on the cactus-minigraph gfa file.
  • How to solve the chromosome path mismatch in the error message, how do I know what the chromosome path in the gbwt index looks like?
vg vgteam rna graph • 1.1k views
ADD COMMENT
0
Entering edit mode
16 months ago

Perhaps for your second question, vg gbwt is the way to go to list the chr paths and contig/sample names:

Step 7: Metadata (one input GBWT):
    -M, --metadata          print basic metadata
    -C, --contigs           print the number of contigs
    -H, --haplotypes        print the number of haplotypes
    -S, --samples           print the number of samples
    -L, --list-names        list contig/sample names (use with -C or -S)
    -T, --thread-names      list thread names
ADD COMMENT
0
Entering edit mode

Thank you for sharing your experience. I used vg gbwt -M -C -H -S -L -T joint-full.cov.gbwt to view the information in the gbwt index, which is as follows:

29379 paths with names, 12 samples with names, 12 haplotypes, 10 contigs with names
Chr1
Chr3
Chr2
Chr10
Chr7
Chr9
Chr6
Chr8
Chr4
Chr5
12
COR_A
COR_B
COR_C
COR_D
COR_E
COR_F
COR_G
COR_H
COR_I
COR_J
COR_K
COR_L
_thread_COR_A_Chr10_0_0
_thread_COR_B_Chr10_0_107698
_thread_COR_B_Chr10_0_198532
_thread_COR_B_Chr10_0_291916
_thread_COR_B_Chr10_0_759772
_thread_COR_B_Chr10_0_943427
_thread_COR_B_Chr10_0_1088850
_thread_COR_B_Chr10_0_1166426
_thread_COR_B_Chr10_0_1229383
_thread_COR_B_Chr10_0_1992992
_thread_COR_B_Chr10_0_2266779
_thread_COR_B_Chr10_0_2317201
_thread_COR_B_Chr10_0_2361082
_thread_COR_B_Chr10_0_2377269
_thread_COR_B_Chr10_0_2521740
_thread_COR_B_Chr10_0_2552854
...

The names of thread shown here are completely different from those in the gfa P line path, which confuses me as to how to make the chromosome paths match. Even if I modify the chromosome name of the gff file according to the above thread name, it still doesn't work.

ADD REPLY

Login before adding your answer.

Traffic: 3131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6