Question: Joining Snpeff And Vcf
0
gravatar for Pierre Lindenbaum
8.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

I've extracted the distinct mutations from a set of VCF files:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    66480    .    AT    A    205    .    (...)
chr1    626686    .    CCT    C    124    .    (...)

and generated the predictions using SnpEff:

(...)
1    66481    *    -T    DEL    Hom    205    0        OR4F5.1    OR4F5    mRNA    NM_001005484            UPSTREAM: 2610 bases            
1    626687    *    -CT    DEL    Hom    124    0        OR4F29.1    OR4F29    mRNA    NM_001005221.2            UPSTREAM: 4652 bases        
1    626687    *    -CT    DEL    Hom    124    0        OR4F16.1    OR4F16    mRNA    NM_001005277.2            UPSTREAM: 4652 bases

now I'd like to join those results with my VCFs. But, as you can see, SnpEff change the way the alternate bases are defined. Do you know any way to join those files ?

Thanks.

EDIT:

here is my temporary C++ solution: it converts the VCF to SnpEff:

    (...)
    while(getline(in,line,'\n'))
     {
     if(line.empty()) continue;
     if(line[0]=='#')
         {
         cout << "#Chromo\tPosition\tReference\tChange\t" << line << endl;
         continue;
         }
     tokenizer.split(line,tokens);
     string chrom=tokens[0];
     if(chrom.compare(0,3,"chr")==0) chrom=chrom.substr(3);
     int pos;
     numeric_cast<int>(tokens[1].c_str(),&pos);
     string ref=tokens[3];
     string alt=tokens[4];
     if(ref.size()<alt.size()) /* AC/A = DELETION */
         {
         assert(ref[0]==alt[0]);
         ref.assign("*");
         alt[0]='+';
         ++pos;
         }
     else if(ref.size()>alt.size())
         {
         assert(ref[0]==alt[0]);
         alt.assign(ref);
         alt[0]='-';
         ref.assign("*");
         ++pos;
         }
     else
         {
         //single SNP
         }
     cout     << chrom
        << "\t"<< pos
        << "\t" << ref
        << "\t" << alt
        << "\t" << line
        << endl;
     } (...)
vcf format • 3.2k views
ADD COMMENTlink modified 6.3 years ago by tranthach900 • written 8.5 years ago by Pierre Lindenbaum129k

hi everyone. help me!

# run VarScan > rice-snp.vcf
# and I want run with snpEff but error!

./snpEff$ java -jar snpEff.jar rice7 rice-snp.vcf > s.eff.vcf

ERRORS: Some errors were detected
Error type      Number of errors
ERROR_CHROMOSOME_NOT_FOUND      330650

NEW VERSION!

        There is a new SnpEff version available:
                Version      : 3.6
                Release date : 2014-04-21
                Download URL : http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip

# thanks.

# format file input.vcf

CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    Sample1
LOC_Os01g01070    1254    .    A    G    .    PASS    ADP=13;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01070    3850    .    A    G    .    PASS    ADP=11;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01070    4240    .    C    T    .    PASS    ADP=12;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01080    2809    .    T    C    .    PASS    ADP=11;WT=0;HET=0;HOM=1;NC=0    
LOC_Os01g01090    435    .    G    A    .    PASS    ADP=15;WT=0;HET=1;HOM=0;NC=0    
.......
ADD REPLYlink modified 10 months ago by RamRS28k • written 6.3 years ago by tranthach900

Don't put your question as an answer, it will be deleted.

Ask it as a new question

ADD REPLYlink modified 10 months ago by RamRS28k • written 6.3 years ago by Istvan Albert ♦♦ 84k
4
gravatar for Pablo
8.5 years ago by
Pablo1.9k
Canada
Pablo1.9k wrote:

One solution is to use VCF output (as suggested in other answer) and then split one effect per line using the vcfEffOnePerLine.pl script that you can find in the 'scripts' directory of SnpEff's distribution.

ADD COMMENTlink written 8.5 years ago by Pablo1.9k

thanks , vcfEffOnePerLine was missing from my distribution.

ADD REPLYlink written 8.5 years ago by Pierre Lindenbaum129k
2
gravatar for Aaronquinlan
8.5 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

Why not use the SnpEff option to report it's predictions in VCF format? This way, there will be no need to join?

 java -Xmx4G -jar snpEff.jar eff -i vcf -o vcf GRCh37.63 sample.vcf > sample.annotated.vcf
ADD COMMENTlink written 8.5 years ago by Aaronquinlan11k

because I want to keep one type of mutation per lines. SnpEff puts all the possible effects in the INFO column.

ADD REPLYlink written 8.5 years ago by Pierre Lindenbaum129k

Ah, I see. It might be easier to just use awk to "expand" each VCF line for each annotation in the INFO field. You're right, the change in allele definition is irksome.

ADD REPLYlink written 8.5 years ago by Aaronquinlan11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour