Question: Parser for PAF to find structural variants
0
gravatar for crimsontabaq
15 months ago by
crimsontabaq50
Russia, Kazan
crimsontabaq50 wrote:

Need to evaluate two large (1Gb) chromosome-level assemblies of the same genome by means of finding large structural variations between the two (duplication, inversion, deletion etc). I am trying to use minimap2 to get this sort of statistics (similiar to somewhat classical nucmer - show-diff approach), but I couldn't find any parser for .paf files (only paftools.js from minimap' creator, but it does not produce desired statistics). Conversion of .paf to .delta and using dna-diff is somehow imperfect.

Do you know any parser of .paf files for finding stuctural variations? Or a workaroung of a problem comparing two large assemblies? Many thanks!

parser evaluation assembly • 818 views
ADD COMMENTlink modified 10 months ago by colindaven2.5k • written 15 months ago by crimsontabaq50
1

Why not use sam/bam files?

ADD REPLYlink written 15 months ago by WouterDeCoster44k

Good idea, sam is much more used. Can yous suggest a particular way of doing the task with sam? I am still not sure if I should do local alignment or global whole-genome one. Much obliged!

ADD REPLYlink written 15 months ago by crimsontabaq50

I would suggest taking a look at this approach: https://github.com/lh3/CHM-eval/tree/master/dip-call

ADD REPLYlink modified 15 months ago • written 15 months ago by WouterDeCoster44k

Thanks a lot, great util. But still I was looking for large SV discovery, and what you shared produces only small SV - SNPs and indels.

ADD REPLYlink written 15 months ago by crimsontabaq50
1
gravatar for Shangzhe Zhang
11 months ago by
China
Shangzhe Zhang10 wrote:

Hi,

I suggest the Nucdiff or Syri, which can detect long (longer as they can) based on the assembly alignment. However I didn't get good result from them. I compaired two ~100Mb chromosomes and didn't even got an alignment result following the suggested pipeline! Well, syri was fast enough for the A.th example data. You can try them and maybe tell me about your runtime and etc.

Best,

Shangzhe

ADD COMMENTlink modified 11 months ago • written 11 months ago by Shangzhe Zhang10

Hi. I am curious to know what were the issues with these methods. It would be great if you could please share why the results were not good.

ADD REPLYlink written 6 months ago by Manish10

Sorry for the delay. They didn't perform well on the large-scale genomes, such as human. It'll take lot of resources.

ADD REPLYlink written 3 months ago by Shangzhe Zhang10
1
gravatar for colindaven
10 months ago by
colindaven2.5k
Hannover Medical School
colindaven2.5k wrote:

Dotplots might be useful if you have long and collinear contigs (Nanopore, Pacbio). If Illumina, well, forget it. One dotplot implementation is pretty good in the GUI package Ugene.

It's not an easy approach though. I believe assembly to assembly genome multimappings are a somewhat poor and unloved area of bioinformatics.

ADD COMMENTlink written 10 months ago by colindaven2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1440 users visited in the last hour