Question: VCF file with refrence genome in Discosnp is epmty
0
gravatar for Hanan
3.7 years ago by
Hanan70
Tel Aviv University
Hanan70 wrote:

Hello

Trying to crate a vcf file  with a reference genome in Discosnp results in the below stdout with an empty vcf file, only headers. The sam file is OK. VCF creation using run_VCF_creator.sh without reference genome is OK as well.  

[M::main_mem] read 135392 sequences (10000014 bp)...
[M::main_mem] read 136230 sequences (10000131 bp)...
[M::main_mem] read 137222 sequences (10000111 bp)...
[M::main_mem] read 141176 sequences (10000018 bp)...
[M::main_mem] read 87706 sequences (5565243 bp)...
[main] Version: 0.7.5a-r405
[main] CMD: /usr/bin/bwa mem /storage/nr/db/nucleotide/v2_pseudomolecules.fasta dd26_k_31_c_auto_D_100_P_1_b_1_coherentbis.fasta
[main] Real time: 1276.659 sec; CPU: 1226.005 sec
Traceback (most recent call last):
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/VCF_creator.py", line 132, in <module>
    variant_object, vcf_field_object=InitVariant(line1,line2) #Fills the object with the line of the samfile
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/functionObjectVCF_creator.py", line 25, in InitVariant
    variant_object=SNP(line1,line2)
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 649, in __init__
    VARIANT.__init__(self,line1,line2)
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 68, in __init__
    self.upper_path=PATH(line1)#line in the file corresponding to the upper path
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 376, in __init__
    self.GetDicoMappingPosition()
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 417, in GetDicoMappingPosition
    posMut,nbMismatch=self.GetTag()
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 583, in GetTag
    return (posMut,nbMismatch)
UnboundLocalError: local variable 'posMut' referenced before assignment
... Creation of the vcf file : done ...==> dd26_k_31_c_auto_D_100_P_1_b_1_coherent.vcf
... Creation of the vcf file for IGV: done ...==> dd26_k_31_c_auto_D_100_P_1_b_1_coherent_for_IGV.vcf
Vcf creation time in seconds: 1286
        ###############################################################
        #################### DISCOSNP++ FINISHED ######################
        ###############################################################
DiscoSnp++ total time in seconds: 32706
        ################################################################################################################
         fasta of predicted variant is "dd26_k_31_c_auto_D_100_P_1_b_1_coherent.fa"
         VCF file (1-based) is "dd26_k_31_c_auto_D_100_P_1_b_1_coherent.vcf"
         An IGV ready VCF file (sorted by position, only mapped variants, 0-based) is "dd26_k_31_c_auto_D_100_P_1_b_1_coherent_for_IGV.vcf"
         Thanks for using discoSnp++ - http://colibread.inria.fr/discoSnp/ - Forum: http://www.biostars.org/t/discoSnp/
        ################################################################################################################

Thank you

Hanan

discosnp • 1.1k views
ADD COMMENTlink modified 3.5 years ago by Chloe Riou40 • written 3.7 years ago by Hanan70

Hi Hanan, 

 

Sorry for this issue. Could you indicate the discoSnp version generating this issue ?

ADD REPLYlink written 3.7 years ago by pierre.peterlongo840

I have tried 2.2.0, 2.2.1 and 2.2.4 Hanan

ADD REPLYlink written 3.7 years ago by Hanan70

Right, thanks. 

We maybe identified this problem that may be due to an exception in the sam output. Would you mind sending me the .sam file that had been generated during your process?

 

Best regards, 

Pierre

ADD REPLYlink written 3.7 years ago by pierre.peterlongo840

No problem, how do I send it?

ADD REPLYlink written 3.7 years ago by Hanan70
1
gravatar for Chloe Riou
3.6 years ago by
Chloe Riou40
Chloe Riou40 wrote:

Hi Hanan,

I am currently working with Pierre on discoSnp++, many thanks for your file. We treat your data with VCF creator and it seems that this is a problem of a missing tag in your sam file, more precisely the MD tag (we use it to get the correct location of the discoSnp++ prediction). I can't check it without your reference genome, but you could try the tools "samtools calmd" to generate the missing tag, and after that you should be able to run VCF creator.

Thank you for being patient,

Best regards,

Chloé

 

In the samtools manual : 

http://www.htslib.org/doc/samtools.html

samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta>

Generate the MD tag. If the MD tag is already present, this command will give a warning if the MD tag generated is different from the existing tag. Output SAM by default.

Calmd can also read and write CRAM files although in most cases it is pointless as CRAM recalculates MD and NM tags on the fly. The one exception to this case is where both input and output CRAM files have been / are being created with the no_ref option.

OPTIONS:

-A

When used jointly with -r this option overwrites the original base quality.

-e

Convert a the read base to = if it is identical to the aligned reference base. Indel caller does not support the = bases at the moment.

-u

Output uncompressed BAM

-b

Output compressed BAM

-C INT

Coefficient to cap mapping quality of poorly mapped reads. See the pileup command for details. [0]

-r

Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).

-E

Extended BAQ calculation. This option trades specificity for sensitivity, though the effect is minor.

 

 

 

ADD COMMENTlink written 3.6 years ago by Chloe Riou40

hello after a long time back to disco, thank Chloe you for your suggestion To use samtools calmd I need aln.bam file. trying to convert disco sam file to bam results in this output:

hanan@icci:~/SXG$ samtools view -bu SXG_k_31_c_auto_D_100_P_1_b_1_coherentBWA_MEM.sam >SXG.bam [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "SXG_k_31_c_auto_D_100_P_1_b_1_coherentBWA_MEM.sam".

Thank you Hanan

ADD REPLYlink written 3.5 years ago by Hanan70
0
gravatar for Chloe Riou
3.5 years ago by
Chloe Riou40
Chloe Riou40 wrote:

Hi Hanan,

Could you try to use this version of bwa : https://sourceforge.net/projects/bio-bwa/ please ? It seems that it produces the right tag MD.

Best regards,

Chloé

ADD COMMENTlink written 3.5 years ago by Chloe Riou40

Hi Chloe, Using this version of bwa solved the problem. I have succeeded to run VCF creator with a reference genome. Thank you.

ADD REPLYlink written 3.5 years ago by Hanan70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1981 users visited in the last hour