Constructing a vg graph from MSA with a large number of sequences
2
0
Entering edit mode
5 months ago
jdru • 0

Hi all,

I have a multiple sequence alignment of ~ 26000 sequences, each around 16kb. I would like to construct a vg graph of this alignment with all sequences embedded as haplotypes. However I am finding that

vg construct -M *msa_path*

Is too slow with this input size. I am able to construct the graph without the embedded paths however. I can also split the input into a smaller number of sequences and construct graphs for each subset of the MSA, but in this case I have not been able to merge these smaller graphs by overlapping nodes. So I was wondering if there was a simple way to achieve this, or is it simply infeasible to construct a graph in this way?

Thank you!

graph msa variation vg • 340 views
ADD COMMENT
1
Entering edit mode
4 months ago
Jouni Sirén ▴ 270

You could try using PGGB instead of vg construct. I'm not sure how well it will scale to a large number of short sequences, as it's primarily inteded for a smaller number of long sequences.

ADD COMMENT
0
Entering edit mode
4 months ago
jdru • 0

Thank you so much, I have a graph! For some reason there are three connected components when there should only be one but at least it's a start.

ADD COMMENT

Login before adding your answer.

Traffic: 933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6