Question

Tool:DiscoSnp++ 2.1.2 release: now genotypes and creates VCFs

5

Entering edit mode

9.1 years ago

pierre.peterlongo ▴ 900

We are pleased to announce a new release of the discoSnp++ tool (Home page: http://colibread.inria.fr/software/discosnp/)

DiscoSNP++ is a reference-free SNP/indel discovery tool.

From version 2.1.2:

discoSnp++ generates a VCF as output:
- Without mapping positions if no reference genome is available
- With mapping positions else. In this latter case, discoSnp++ uses bwa for mapping.
discoSnp++ computes genotypes from predicted coverages of variants. This predictions are reported both in the fasta output and in the VCF file.

As usual any comment or feedback (negative or positive :)) is welcome.

Pierre

SNP discosnp genotyping indel • 2.8k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by pierre.peterlongo ▴ 900

0

Entering edit mode

Hello, compilation of latest version fails for some reason (first time I try to compile). Any ideas? http://pastebin.ca/2962771 I am running centos 64bit

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

Hi,

Are you using clang ?

In this case, could you test to use gcc instead?
This would need to avoid to use the automatic compiler (./compile_discoSnp++.sh). For instance if you disposes from gcc 4.9 installed;

rm -rf build
mkdir build
cd build
cmake -DCMAKE_C_COMPILER=gcc-4.9 -DCMAKE_CXX_COMPILER=g++-4.9 ..
make
cd ..

Best, Pierre

ADD REPLY • link 9.1 years ago by pierre.peterlongo ▴ 900

1

Entering edit mode

Hello, that worked! after cmake

cmake -DCMAKE_C_COMPILER=/opt/centos/devtoolset-1.0/root/usr/bin/gcc -DCMAKE_CXX_COMPILER=/opt/centos/devtoolset-1.0/root/usr/bin/g++ ..

i did make. Where can I find binaries? Thanks!

ADD REPLY • link 9.1 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

Nice to know, thanks. Binaries are in the ROOT/build/tools/ directory. However, note that the principal script is ROOT/run_discoSnp++.sh.

Pierre

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by pierre.peterlongo ▴ 900

0

Entering edit mode

Quick question (is it ok to ask questions here or should I make a new post?). I am a bit confused about the difference between variation found in the coherent and uncoherent files. I thought it was really based on coverage. I have the following example of 2 sets of coherent sequences:

>SNP_higher_path_20|P_1:30_T/G|high|nb_pol_1|C1_114|C2_97|C3_96|C4_92|C5_88|G1_0/0:9,347,2284|G2_0/0:9,296,1944|G3_0/0:8,293,1924|G4_0/0:12,251,1808|G5_0/1:144,133,1581|Q1_71|Q2_70|Q3_70|Q4_68|Q5_66|rank_0.32756
>SNP_lower_path_20|P_1:30_T/G|high|nb_pol_1|C1_0|C2_0|C3_0|C4_2|C5_16|G1_0/0:9,347,2284|G2_0/0:9,296,1944|G3_0/0:8,293,1924|G4_0/0:12,251,1808|G5_0/1:144,133,1581|Q1_0|Q2_0|Q3_0|Q4_47|Q5_61|rank_0.32756

>SNP_higher_path_178|P_1:30_A/C|high|nb_pol_1|C1_7|C2_10|C3_21|C4_3|C5_0|G1_1/1:1644,187,48|G2_1/1:1653,170,76|G3_0/1:1907,152,191|G4_1/1:2071,279,16|G5_1/1:2044,311,9|Q1_58|Q2_57|Q3_51|Q4_49|Q5_0|rank_0.22405
>SNP_lower_path_178|P_1:30_A/C|high|nb_pol_1|C1_87|C2_89|C3_107|C4_106|C5_102|G1_1/1:1644,187,48|G2_1/1:1653,170,76|G3_0/1:1907,152,191|G4_1/1:2071,279,16|G5_1/1:2044,311,9|Q1_65|Q2_61|Q3_63|Q4_67|Q5_68|rank_0.22405

and then these 2 sets are considered uncoherent:

>SNP_higher_path_69|P_1:30_T/G|high|nb_pol_1|C1_13|C2_14|C3_9|C4_3|C5_0|G1_1/1:1393,123,115|G2_1/1:1520,134,123|G3_1/1:1761,190,64|G4_1/1:1614,213,18|G5_1/1:2164,329,9|Q1_58|Q2_62|Q3_57|Q4_59|Q5_0|rank_0.21337
>SNP_lower_path_69|P_1:30_T/G|high|nb_pol_1|C1_77|C2_84|C3_94|C4_83|C5_108|G1_1/1:1393,123,115|G2_1/1:1520,134,123|G3_1/1:1761,190,64|G4_1/1:1614,213,18|G5_1/1:2164,329,9|Q1_65|Q2_64|Q3_66|Q4_68|Q5_68|rank_0.21337

>SNP_higher_path_39|P_1:30_T/G|high|nb_pol_1|C1_12|C2_13|C3_13|C4_2|C5_0|G1_1/1:1615,155,98|G2_1/1:1761,169,105|G3_1/1:1645,154,108|G4_1/1:1748,242,12|G5_1/1:1724,263,8|Q1_51|Q2_61|Q3_60|Q4_43|Q5_0|rank_0.19546
>SNP_lower_path_39|P_1:30_T/G|high|nb_pol_1|C1_88|C2_96|C3_90|C4_89|C5_86|G1_1/1:1615,155,98|G2_1/1:1761,169,105|G3_1/1:1645,154,108|G4_1/1:1748,242,12|G5_1/1:1724,263,8|Q1_66|Q2_66|Q3_65|Q4_67|Q5_68|rank_0.19546

Do these provide any clue as to why some are coherent and why some are not? Because when I look at the coverage of the minimal variant it is not very obvious to me.

Thank you!

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Adrian Pelin ★ 2.6k

1

Entering edit mode

(A new post would be best for this question.. but it's alright!)

I'll let Pierre confirm the following guess: coherent/incoherent is indeed based on coverage, but one should distinguist k-mer coverage with k-read-coverage (as defined is discosnp paper). A SNP can be k-read-incoherent, meaning it might be an assembly artefact that the reads cannot explain, yet the whole path might have a sufficient mean coverage of individual kmers.

ADD REPLY • link 9.1 years ago by Rayan Chikhi ★ 1.5k

Ram · Answer 1 · 2015-03-27

Hi Adrian

The read coherency is computed as follows:

The reads of each read set are mapped back on each prediction (authorizing by default one mismatch anywhere but not on the variant(s) position(s).

For each read set, each predicted sequence is said "k-read-covered" if all kmers of this sequence are covered by at least c reads (c being a main parameter, =4 by default).

Variants for which the two sequences are not k-read-coherent for all read sets are declared "uncoherent".

The read coverage indicated in the outputs (.fa and .vcf) are the sum of number of the read mapped. Thus this is possible to have a high read coverage for uncoherent variants: this means that a lot of reads mapped some parts of the sequence but that some other regions of the sequence are not mapped.

I hope this answers your questions.

Best, Pierre.