Hi! I have several SNPs calculated from reads of different strains of S. cerevisiae to an assambly I made.
Given the positions of the SNPs in the contigs of the alignment, I'm now trying to plot the distribution of found variants along each chromosome.
In order to do this, I've used the Mummer3 package (http://mummer.sourceforge.net/) and followed 2 unsuccessful strategies:
1) I ran the nucmer script to align my assambly to the last version of S.cerevisiae's genome. I then used the show-tiling script to select the best alignments and show the mapping through the chromosomes.
The position of the SNP in the chromosome should be equal to:
position in the contig (Given in the .VCF file) + start of the alignment in the reference (found in the show tiling output) - start of the alignment in the contig (NOT FOUND :( and approach truncated :'( ).
The starting point of the alignment in each contig was found using the "show coords" script, but I couldn't correlate the output of the show-coords with the one of show-tiling.
Show-coords output header:
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [TAGS]
=====================================================================================
1 1612 | 1601 1 | 1612 1601 | 94.79 | ref|Chromosome_10| NODE_167_length_2074_cov_613.985
2536 4055 | 1 1495 | 1520 1495 | 97.43 | ref|Chromosome_10| NODE_159_length_2510_cov_954.567
3880 6463 | 73172 70640 | 2584 2533 | 92.82 | ref|Chromosome_10| NODE_53_length_74700_cov_58.4878
4528 5367 | 875 129 | 840 747 | 88.57 | ref|Chromosome_10| NODE_152_length_3038_cov_63.6977
5421 6274 | 1671 2510 | 854 840 | 97.54 | ref|Chromosome_10| NODE_159_length_2510_cov_954.567
7116 7347 | 7214 6987 | 232 228 | 97.84 | ref|Chromosome_10| NODE_47_length_85444_cov_662.196
7345 9938 | 312 2856 | 2594 2545 | 91.31 | ref|Chromosome_10| NODE_124_length_7594_cov_56.6321
Show-tiling output header:
>ref|Chromosome_10| 758181 bases
8153 153910 2170 145758 88.61 97.43 - NODE_25_length_145758_cov_50.3642
156081 157189 -74 1109 100.00 98.32 + NODE_190_length_1109_cov_97.0845
157116 201749 5933 44634 97.56 97.92 + NODE_82_length_44634_cov_52.1978
207683 518677 31798 310995 85.94 97.51 + NODE_5_length_310995_cov_51.7096
550476 711480 2901 161005 99.76 97.93 - NODE_21_length_161005_cov_48.549
714382 735009 732 20628 98.00 97.13 + NODE_109_length_20628_cov_71.3879
735742 740637 15913 4896 85.46 98.21 + NODE_138_length_4896_cov_111.069
756551 757564 617 1014 99.90 97.18 + NODE_195_length_1014_cov_337.505
2) I made a pseudo molecule using show-tiling -p option, and calculated the SNPs there, being able to obtain the positions. However, when I made a gene prediction on the pseudomolecule I only obtained ~1300 genes out of 5600 in the given yeast (that were successfully predicted out of my original assambly)
So, am I missing something? Is there a better way to do all this?
Thank you!.
what is this file ? does it contains CHROM and POS ?
if so, why do you need the other steps ? Mummer3 etc... ?
Because the contigs of my assambly do not correlate with the chromosomes themselves. I have ~200 contigs and 17 chromosomes.
The vcf file locates the variants in the contigs, and I was trying to use Mummer3 to map them in the real chromosomes