How To Generate Variant Callings Of One Sample That Are Unique From Others In One Variant Calling File?
5
2
Entering edit mode
7.4 years ago
zengtony743 ▴ 70

I have one variant calling file (VCF) which has three samples (affected, control1 and control2). I want to generate variants that are 100% unique in affected sample but not in 2 control samples.

VCF variant information lines are

CHROM  POS ID REF ALT QUAL FILTER INFO      FORMAT          affected        control1        control2
chr4    x  x   A   C   78   pass   xx   GT:AD:DP:GQ:PL    0/1:x:x:x:x     0/1:x:x:x:x     1/1:x:x:x:x 
chr8    x  x   A   T   1444 pass   xx   GT:AD:DP:GQ:PL    0/1:x:x:x:x     0/0:x:x:x:x     0/0:x:x:x:x 
chr10   x  x   T   C   230  pass   xx   GT:AD:DP:GQ:PL    1/1:x:x:x:x     0/0:x:x:x:x     0/0:x:x:x:x

If I want get the new variant calling file:

CHROM POS ID REF ALT QUAL FILTER INFO       FORMAT             affected 
chr8   x  x   A   T  1444  pass  xx    GT:AD:DP:GQ:PL        0/1:x:x:x:x 
chr10  x  x   T   C  230   pass  xx    GT:AD:DP:GQ:PL        1/1:x:x:x:x

I want generate a new VCF only has affected variant callings that are unique (by comparing with variants of 2 controls). How can I do it?

vcf • 5.4k views
ADD COMMENT
0
Entering edit mode

Visit GeneTalk and perform an inheritance filter step (look for dominant variants) after you have uploaded the multiple VCF file and set up the pedigree information.

ADD REPLY
0
Entering edit mode

Thank you! Alexej

ADD REPLY
0
Entering edit mode

Dear all,

When I ran this command which worked before, but now it did not work.. I could not figure it out..

[jvarkit]$ java -Xmx15g -jar dist/vcffilterjs.jar -f script.js allsnp.vcf > Crf04.vcf &
[1] 2049
[rzeng@qlogin4 jvarkit]$ Exception in thread "main" java.lang.NoClassDefFoundError: net/sf/samtools/util/BlockCompressedOutputStream
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
    at java.lang.Class.getMethod0(Class.java:2774)
    at java.lang.Class.getMethod(Class.java:1663)
    at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: net.sf.samtools.util.BlockCompressedOutputStream
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 6 more

[1]+  Exit 1                  java -Xmx15g -jar dist/vcffilterjs.jar -f script.js allsnp.vcf > Crf04.vcf
ADD REPLY
0
Entering edit mode

I doubt my super cluster/ computer has problem ...

[xxx@qlogin4 jvarkit]$ cd dist
[xxx@qlogin4 dist]$ ls
vcffilterjs  vcffilterjs.jar
[xxx@qlogin4 dist]$ java -jar vcffilterjs.jar
Exception in thread "main" java.lang.NoClassDefFoundError: net/sf/samtools/util/BlockCompressedOutputStream
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
    at java.lang.Class.getMethod0(Class.java:2774)
    at java.lang.Class.getMethod(Class.java:1663)
    at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by: java.lang.ClassNotFoundException: net.sf.samtools.util.BlockCompressedOutputStream
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 6 more
ADD REPLY
0
Entering edit mode

Old problem solved but new one comes...

  1. Here is commands that I use:

    java -Xmx45g -jar /hpf/tools/centos6/jvarkit/2014.10.15/dist/vcffilterjs.jar -f script.js allsnp1.vcf > 136F1.vcf &
    
  2. Here is the script.js (from Pierre)

    function accept(ctx)
    {
    var y,g2;
    var sampleList=header.getSampleNamesInOrder();
    
    var g1=ctx.getGenotype("136");
    /** ignore non-called */
    if(g1== null || ! g1.isCalled() ) return false;
    /** loop over the other samples */
    for(y=0;y< sampleList.size();++y)
    {
    g2=ctx.getGenotype( sampleList.get(y) );
    
    if(g2.getSampleName().equals(g1.getSampleName())) continue;
    /** ignore non-called */
    if(! g2.isCalled() ) continue;
    
    /** is g1==g2 ? */
    if( g1.sameGenotype( g2 ) ) return false;
    }
    /* found no other same genotype */
    return true;
    }
    accept(variant);
    
  3. output file is small as 26K. here is the output look like

    ........
    ........
    .......
    ....
    ##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
    ##VCFFilterJSCmdLine=-f script.js allsnp1.vcf
    ##VCFFilterJSVersion=c83f20cde867920870918ee6eb5e5406f554e2bb
    ##contig=<ID=chr10,length=130694993>
    ##contig=<ID=chr11,length=122082543>
    ##contig=<ID=chr12,length=120129022>
    ##contig=<ID=chr13,length=120421639>
    ##contig=<ID=chr14,length=124902244>
    ##contig=<ID=chr15,length=104043685>
    ##contig=<ID=chr16,length=98207768>
    ##contig=<ID=chr17,length=94987271>
    ##contig=<ID=chr18,length=90702639>
    ##contig=<ID=chr19,length=61431566>
    ##contig=<ID=chr1,length=195471971>
    ##contig=<ID=chr2,length=182113224>
    ##contig=<ID=chr3,length=160039680>
    ##contig=<ID=chr4,length=156508116>
    ##contig=<ID=chr5,length=151834684>
    ##contig=<ID=chr6,length=149736546>
    ##contig=<ID=chr7,length=145441459>
    ##contig=<ID=chr8,length=129401213>
    ##contig=<ID=chr9,length=124595110>
    ##contig=<ID=chrM,length=16299>
    ##contig=<ID=chrX,length=171031299>
    ##contig=<ID=chrY,length=91744698>
    ##reference=file:///hpf/projects/mjustice/reference/genome.fa
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 111  
    129_Control 136  137  186  195  196  198  620  752  
    97  C57B6_control Chol1_592 Crf03_82 Crf04_151 L11J27 
    L11J74 M1527_280 M1527_305 M1527_84 M895_206 M895_417
    M895_419 M895_64 Nur19_1457
    

    But no MORE information after the line of (#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT)

ADD REPLY
0
Entering edit mode

Sorry that I posted question as an answer here!

ADD REPLY
4
Entering edit mode
7.4 years ago

My tool VCFFilterJS can do this.

Here is a javascript file that select the variant where one sample has a genotype different from the others:

function accept(ctx)
    {
    var x,y,g1,g2,count_same=0;
    var sampleList=header.getSampleNamesInOrder();
    /** loop over one sample */
    for(x=0;x < sampleList.size();++x)
        {
        g1=ctx.getGenotype( sampleList.get(x) );
        /** ignore non-called */
        if(! g1.isCalled() ) continue;
        count_same=0;
        /** loop over the other samples */
        for(y=0;y< sampleList.size() && count_same==0 ;++y)
            {
            if(x==y) continue;/* same sample ?*/
            g2=ctx.getGenotype( sampleList.get(y) );
            /** ignore non-called */
            if(! g2.isCalled() ) continue;
            /** is g1==g2 ? */
            if( g1.sameGenotype( g2 ) )
                {
                count_same++;
                }
            }
        /* found no other same genotype */
        if(count_same==0) return true;
        }
    return false;
    }
accept(variant);

and a test with 1000 genomes:

curl "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz" |\
gunzip -c |\
java -jar dist/vcffilterjs.jar  -f select.js |\
grep -v "#" | cut -f 1-5 

1    13957    rs201747181    TC    T
1    51914    rs190452223    T    G
1    54753    rs143174675    T    G
1    55313    rs182462964    A    T
1    55326    rs3107975    T    C
1    55330    rs185215913    G    A
1    55388    rs182711216    C    T
1    55416    rs193242050    G    A
1    55427    rs183189405    T    C
1    62156    rs181864839    C    T
1    63276    rs185977555    G    A
1    66457    rs13328655    T    A
1    69534    rs190717287    T    C
1    72148    rs182862337    C    T
1    77470    rs192898053    T    C
1    79137    rs143777184    A    T
1    81949    rs181567186    T    C
1    83088    rs186081601    G    C
1    83977    rs180759811    A    G
1    84346    rs187855973    T    C

verify with rs201747181

curl  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz" |\
gunzip -c  |\
grep rs201747181 |\
cut -f 10- |\
tr "   " "\n" |\
cut -d ':' -f 1 |\
sort |\
uniq -c

   1013 0|0
     26 0|1
      7 1|0
      1 1|1 <============ HERE

Here is another example with only one sample:

function accept(ctx)
    {
    var y,g2;
    var sampleList=header.getSampleNamesInOrder();

    var g1=ctx.getGenotype("M10475");
    /** ignore non-called */
    if(g1== null || ! g1.isCalled() ) return false;
    /** loop over the other samples */
    for(y=0;y< sampleList.size();++y)
        {
        g2=ctx.getGenotype( sampleList.get(y) );

        if(g2.getSampleName().equals(g1.getSampleName())) continue;
        /** ignore non-called */
        if(! g2.isCalled() ) continue;

        /** is g1==g2 ? */
        if( g1.sameGenotype( g2 ) ) return false;
        }
    /* found no other same genotype */
    return true;
    }
accept(variant);

exec:

$  curl -k "https://raw.github.com/arq5x/gemini/master/test/test5.vep.snpeff.vcf" | java -jar dist/vcffilterjs.jar  -f script.js | grep -A 1 CHROM 

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    M10475    M10478    M10500    M128215
chr1    145273345    .    T    C    289.85    .    AC=3;AF=0.38;AN=8;BaseQRankSum=1.062;CSQ=missense_variant|Tct/Cct|S/P|ENSG00000213240|NOTCH2NL|ENST00000369340|4/6|benign(0.238)|tolerated(0.45),missense_variant|Tct/Cct|S/P|ENSG00000213240|NOTCH2NL|ENST00000362074|3/5|benign(0.238)|tolerated(0.45),missense_variant&NMD_transcript_variant|Tct/Cct|S/P|ENSG00000255168||ENST00000468030|3/23|benign(0.416)|tolerated(0.55),missense_variant|Tct/Cct|S/P|ENSG00000213240|NOTCH2NL|ENST00000344859|3/6|possibly_damaging(0.545)|tolerated(0.44);DP=1000;DS;Dels=0.00;EFF=EXON(MODIFIER|||||RP11-458D21.5|nonsense_mediated_decay|NON_CODING|ENST00000468030|3),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Tct/Cct|S67P|230|NOTCH2NL|protein_coding|CODING|ENST00000344859|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Tct/Cct|S67P|236|NOTCH2NL|protein_coding|CODING|ENST00000362074|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Tct/Cct|S67P|236|NOTCH2NL|protein_coding|CODING|ENST00000369340|);FS=3.974;HRun=1;HaplotypeScore=17.4275;MQ=29.25;MQ0=0;MQRankSum=-1.370;QD=0.39;ReadPosRankSum=-1.117    GT:AD:DP:GQ:PL    0/0:226,22:250:99:0,158,4259    0/1:224,24:250:6:6,0,5314    0/1:219,28:249:57:57,0,5027    0/1:215,34:250:99:269,0,3796
ADD COMMENT
0
Entering edit mode

Hi, Pierre,

Thank you for the answer. I have tried the following steps,

  1. Downloaded vcffiterjs tool by using ant vcffilterjs
  2. Changed the sample name (as showed as "Affected" in the line 6) according to the script you sent to me. "Affected " is the sample that need to be having unique genotype with other samples. Then I saved this script and named it as script.js

    function accept(ctx)
         {
         var y,g2;
         var sampleList=header.getSampleNamesInOrder();
    
         var g1=ctx.getGenotype("Affected");
         /** ignore non-called */
         if(g1== null || ! g1.isCalled() ) return false;
         /** loop over the other samples */
         for(y=0;y< sampleList.size();++y)
             {
             g2=ctx.getGenotype( sampleList.get(y) );
    
             if(g2.getSampleName().equals(g1.getSampleName())) continue;
             /** ignore non-called */
             if(! g2.isCalled() ) continue;
    
             /** is g1==g2 ? */
             if( g1.sameGenotype( g2 ) ) return false;
             }
         /* found no other same genotype */
         return true;
         }
     accept(variant);
    
  3. My input VCF dataset is controlplus64.vcf that has variants of three samples as affected, control1 and control 2

  4. I used the commands as follows

    [rzeng@prism jvarkit]$ java -jar dist/vcffilterjs.jar -f script.js controlplus64.vcf > out.vcf &
    

    But it showed ERROR as follows,

    [rzeng@prism jvarkit]$ ERROR: Invalid argument '-f'
    

    Not sure why -f is not right here??

ADD REPLY
0
Entering edit mode

What the output of java -jar dist/vcffilterjs.jar -h?

ADD REPLY
0
Entering edit mode

Furthermore, I cannot find the sentence "Invalid argument" in my sources: $ find ./ -name "*.java" -exec grep "Invali" '{}' ';' . I suspect, you're using an old version of my program.

ADD REPLY
0
Entering edit mode
[rzeng@prism jvarkit]$ java -jar dist/vcffilterjs.jar -h

USAGE: VCFFilterJS [options]

 Filtering VCF with javascript (java rhino). The script puts 'variant' a org.broadinstitute.variant.variantcontext.VariantContext  ( http://sourceforge.net/p/picard/code/HEAD/tree/trunk/src/java/org/broadinstitute/variant/variantcontext/VariantContext.java )  and 'header' ( org.broadinstitute.variant.vcf.VCFHeader http://sourceforge.net/p/picard/code/HEAD/tree/trunk/src/java/org/broadinstitute/variant/vcf/VCFHeader.java) in the script context .Version: 1.0

Version: null


Options:
ADD REPLY
0
Entering edit mode

That's an old version of my software. Load the latest version using: git clone https://github.com/lindenb/jvarkit.git

ADD REPLY
1
Entering edit mode

Cool! It works for my VCF by producing the file I want.

Thank you so much, Pierre!

ADD REPLY
1
Entering edit mode
7.4 years ago

Snpsift will help in this regard. Then, you can filter as you see fit.

ADD COMMENT
0
Entering edit mode

Thank you Sean!

ADD REPLY

Login before adding your answer.

Traffic: 1825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6