Tool: DiscoSnp++ 2.1.2 release: now genotypes and creates VCFs
5
gravatar for pierre.peterlongo
4.6 years ago by
France
pierre.peterlongo840 wrote:

We are pleased to announce a new release of the discoSnp++ tool (home page: http://colibread.inria.fr/software/discosnp/)

DiscoSNP++ is a reference-free SNP/indel discovery tool.

From version 2.1.2:

1/ discoSnp++ generates a VCF as output:

  • Without mapping positions if no reference genome is available
  • With mapping positions else. In this latter case, discoSnp++ uses bwa for mapping.

2/ discoSnp++ computes genotypes from predicted coverages of variants. This predictions are reported both in the fasta output and in the VCF file.

 

As usual any comment or feedback (negative or positive :)) is welcome.

 

Pierre

 

genotyping snp tool discosnp indel • 1.6k views
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by pierre.peterlongo840

Hello, compilation of latest version fails for some reason f(irst time I try to compile). Any ideas? http://pastebin.ca/2962771 I am running centos 64bit

 

ADD REPLYlink written 4.6 years ago by Adrian Pelin2.3k

Hi,

Are you using clang ?

In this case, could you test to use gcc instead?
This would need to avoid to use the automatic compiler (./compile_discoSnp++.sh). For instance if you disposes from gcc 4.9 installed;

rm -rf build
mkdir build
cd build
cmake -DCMAKE_C_COMPILER=gcc-4.9 -DCMAKE_CXX_COMPILER=g++-4.9 ..
make
cd ..

Best, Pierre

ADD REPLYlink written 4.6 years ago by pierre.peterlongo840
1

Hello, that worked! after cmake

cmake -DCMAKE_C_COMPILER=/opt/centos/devtoolset-1.0/root/usr/bin/gcc -DCMAKE_CXX_COMPILER=/opt/centos/devtoolset-1.0/root/usr/bin/g++ ..

i did make. Where can I find binaries? Thanks!

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Adrian Pelin2.3k

nice to know, thanks.

 

Binaries are in the ROOT/build/tools/ directory.

However, note that the principal script is ROOT/run_discoSnp++.sh.

Pierre

ADD REPLYlink written 4.6 years ago by pierre.peterlongo840

Quick question (is it ok to ask questions here or should I make a new post?). I am a bit confused about the difference between variation found in the coherent and uncoherent files. I thought it was really based on coverage. I have the following example of 2 sets of coherent sequences:

>SNP_higher_path_20|P_1:30_T/G|high|nb_pol_1|C1_114|C2_97|C3_96|C4_92|C5_88|G1_0/0:9,347,2284|G2_0/0:9,296,1944|G3_0/0:8,293,1924|G4_0/0:12,251,1808|G5_0/1:144,133,1581|Q1_71|Q2_70|Q3_70|Q4_68|Q5_66|rank_0.32756
>SNP_lower_path_20|P_1:30_T/G|high|nb_pol_1|C1_0|C2_0|C3_0|C4_2|C5_16|G1_0/0:9,347,2284|G2_0/0:9,296,1944|G3_0/0:8,293,1924|G4_0/0:12,251,1808|G5_0/1:144,133,1581|Q1_0|Q2_0|Q3_0|Q4_47|Q5_61|rank_0.32756

>SNP_higher_path_178|P_1:30_A/C|high|nb_pol_1|C1_7|C2_10|C3_21|C4_3|C5_0|G1_1/1:1644,187,48|G2_1/1:1653,170,76|G3_0/1:1907,152,191|G4_1/1:2071,279,16|G5_1/1:2044,311,9|Q1_58|Q2_57|Q3_51|Q4_49|Q5_0|rank_0.22405
>SNP_lower_path_178|P_1:30_A/C|high|nb_pol_1|C1_87|C2_89|C3_107|C4_106|C5_102|G1_1/1:1644,187,48|G2_1/1:1653,170,76|G3_0/1:1907,152,191|G4_1/1:2071,279,16|G5_1/1:2044,311,9|Q1_65|Q2_61|Q3_63|Q4_67|Q5_68|rank_0.22405

and then these 2 sets are considered uncoherent:

>SNP_higher_path_69|P_1:30_T/G|high|nb_pol_1|C1_13|C2_14|C3_9|C4_3|C5_0|G1_1/1:1393,123,115|G2_1/1:1520,134,123|G3_1/1:1761,190,64|G4_1/1:1614,213,18|G5_1/1:2164,329,9|Q1_58|Q2_62|Q3_57|Q4_59|Q5_0|rank_0.21337
>SNP_lower_path_69|P_1:30_T/G|high|nb_pol_1|C1_77|C2_84|C3_94|C4_83|C5_108|G1_1/1:1393,123,115|G2_1/1:1520,134,123|G3_1/1:1761,190,64|G4_1/1:1614,213,18|G5_1/1:2164,329,9|Q1_65|Q2_64|Q3_66|Q4_68|Q5_68|rank_0.21337

>SNP_higher_path_39|P_1:30_T/G|high|nb_pol_1|C1_12|C2_13|C3_13|C4_2|C5_0|G1_1/1:1615,155,98|G2_1/1:1761,169,105|G3_1/1:1645,154,108|G4_1/1:1748,242,12|G5_1/1:1724,263,8|Q1_51|Q2_61|Q3_60|Q4_43|Q5_0|rank_0.19546
>SNP_lower_path_39|P_1:30_T/G|high|nb_pol_1|C1_88|C2_96|C3_90|C4_89|C5_86|G1_1/1:1615,155,98|G2_1/1:1761,169,105|G3_1/1:1645,154,108|G4_1/1:1748,242,12|G5_1/1:1724,263,8|Q1_66|Q2_66|Q3_65|Q4_67|Q5_68|rank_0.19546

 

Do these provide any clue as to why some are coherent and why some are not? Because when I look at the coverage of the minimal variant it is not very obvious to me.

Thank you!

 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Adrian Pelin2.3k
1

(A new post would be best for this question.. but it's alright!)

I'll let Pierre confirm the following guess: coherent/incoherent is indeed based on coverage, but one should distinguist k-mer coverage with k-read-coverage (as defined is discosnp paper). A SNP can be k-read-incoherent, meaning it might be an assembly artefact that the reads cannot explain, yet the whole path might have a sufficient mean coverage of individual kmers.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Rayan Chikhi1.4k
1
gravatar for pierre.peterlongo
4.6 years ago by
France
pierre.peterlongo840 wrote:

Hi Adrian

The read coherency is computed as follows:

The reads of each read set are mapped back on each prediction (authorizing by default one mismatch anywhere but not on the variant(s) position(s).

For each read set, each predicted sequence is said  "k-read-covered" if all kmers of this sequence are covered by at least c reads (c being a main parameter, =4 by default).

Variants for which the two sequences are not k-read-coherent for all read sets are declared "uncoherent".

The read coverage indicated in the outputs (.fa and .vcf) are the sum of number of the read mapped. Thus this is possible to have a high read coverage for uncoherent variants: this means that a lot of reads mapped some parts of the sequence but that some other regions of the sequence are not mapped.

I hope this answers your questions.

Best, Pierre.

ADD COMMENTlink written 4.6 years ago by pierre.peterlongo840

Thank you for your answer, clears some things up. I have started a new thread with followup questions, DiscoSnp++, question about how SNPs called.

ADD REPLYlink written 4.6 years ago by Adrian Pelin2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1409 users visited in the last hour