VCF file with reference genome in Discosnp is empty
2
0
Entering edit mode
8.3 years ago
Hans ▴ 140

Hello

Trying to create a vcf file with a reference genome in Discosnp results in the below stdout with an empty vcf file, only headers. The sam file is OK. VCF creation using run_VCF_creator.sh without reference genome is OK as well.

[M::main_mem] read 135392 sequences (10000014 bp)...
[M::main_mem] read 136230 sequences (10000131 bp)...
[M::main_mem] read 137222 sequences (10000111 bp)...
[M::main_mem] read 141176 sequences (10000018 bp)...
[M::main_mem] read 87706 sequences (5565243 bp)...
[main] Version: 0.7.5a-r405
[main] CMD: /usr/bin/bwa mem /storage/nr/db/nucleotide/v2_pseudomolecules.fasta dd26_k_31_c_auto_D_100_P_1_b_1_coherentbis.fasta
[main] Real time: 1276.659 sec; CPU: 1226.005 sec
Traceback (most recent call last):
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/VCF_creator.py", line 132, in <module>
    variant_object, vcf_field_object=InitVariant(line1,line2) #Fills the object with the line of the samfile
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/functionObjectVCF_creator.py", line 25, in InitVariant
    variant_object=SNP(line1,line2)
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 649, in __init__
    VARIANT.__init__(self,line1,line2)
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 68, in __init__
    self.upper_path=PATH(line1)#line in the file corresponding to the upper path
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 376, in __init__
    self.GetDicoMappingPosition()
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 417, in GetDicoMappingPosition
    posMut,nbMismatch=self.GetTag()
  File "/storage/hanan/DiscoSNP++-2.2.4-Source/tools/ClassVCF_creator.py", line 583, in GetTag
    return (posMut,nbMismatch)
UnboundLocalError: local variable 'posMut' referenced before assignment
... Creation of the vcf file : done ...==> dd26_k_31_c_auto_D_100_P_1_b_1_coherent.vcf
... Creation of the vcf file for IGV: done ...==> dd26_k_31_c_auto_D_100_P_1_b_1_coherent_for_IGV.vcf
Vcf creation time in seconds: 1286
        ###############################################################
        #################### DISCOSNP++ FINISHED ######################
        ###############################################################
DiscoSnp++ total time in seconds: 32706
        ################################################################################################################
         fasta of predicted variant is "dd26_k_31_c_auto_D_100_P_1_b_1_coherent.fa"
         VCF file (1-based) is "dd26_k_31_c_auto_D_100_P_1_b_1_coherent.vcf"
         An IGV ready VCF file (sorted by position, only mapped variants, 0-based) is "dd26_k_31_c_auto_D_100_P_1_b_1_coherent_for_IGV.vcf"
         Thanks for using discoSnp++ - http://colibread.inria.fr/discoSnp/ - Forum: http://www.biostars.org/t/discoSnp/
        ################################################################################################################

Thank you
Hanan

discosnp • 2.7k views
ADD COMMENT
0
Entering edit mode

Hi Hanan,

Sorry for this issue. Could you indicate the discoSnp version generating this issue?

ADD REPLY
0
Entering edit mode

I have tried 2.2.0, 2.2.1 and 2.2.4 Hanan

ADD REPLY
0
Entering edit mode

Right, thanks.

We maybe identified this problem that may be due to an exception in the sam output. Would you mind sending me the .sam file that had been generated during your process?

Best regards,
Pierre

ADD REPLY
0
Entering edit mode

No problem, how do I send it?

ADD REPLY
1
Entering edit mode
8.2 years ago
Chloe Riou ▴ 40

Hi Hanan,

I am currently working with Pierre on discoSnp++, many thanks for your file. We treat your data with VCF creator and it seems that this is a problem of a missing tag in your sam file, more precisely the MD tag (we use it to get the correct location of the discoSnp++ prediction). I can't check it without your reference genome, but you could try the tools "samtools calmd" to generate the missing tag, and after that you should be able to run VCF creator.

Thank you for being patient,

Best regards,
ChloƩ

In the samtools manual:

samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta>

G enerate the MD tag. If the MD tag is already present, this command will give a warning if the MD tag generated is different from the existing tag. Output SAM by default.

Calmd can also read and write CRAM files although in most cases it is pointless as CRAM recalculates MD and NM tags on the fly. The one exception to this case is where both input and output CRAM files have been / are being created with the no_ref option.

OPTIONS:

-A

When used jointly with -r this option overwrites the original base quality.

-e

Convert a the read base to = if it is identical to the aligned reference base. Indel caller does not support the = bases at the moment.

-u

Output uncompressed BAM

-b

Output compressed BAM

-C INT

Coefficient to cap mapping quality of poorly mapped reads. See the pileup command for details. [0]

-r

Compute the BQ tag (without -A) or cap base quality by BAQ (with -A).

-E

Extended BAQ calculation. This option trades specificity for sensitivity, though the effect is minor.

ADD COMMENT
0
Entering edit mode

hello

after a long time back to disco, thank Chloe you for your suggestion

To use samtools calmd I need aln.bam file. trying to convert disco sam file to bam results in this output:

$ hanan@icci:~/SXG$ samtools view -bu SXG_k_31_c_auto_D_100_P_1_b_1_coherentBWA_MEM.sam   >SXG.bam
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "SXG_k_31_c_auto_D_100_P_1_b_1_coherentBWA_MEM.sam".

Thank you
Hanan

ADD REPLY
0
Entering edit mode
8.1 years ago
Chloe Riou ▴ 40

Hi Hanan,

Could you try to use this version of bwa : https://sourceforge.net/projects/bio-bwa/ please ? It seems that it produces the right tag MD.

Best regards,

ChloƩ

ADD COMMENT
0
Entering edit mode

Hi Chloe, Using this version of bwa solved the problem. I have succeeded to run VCF creator with a reference genome. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6