Extract SNPs present in mutants
1
0
Entering edit mode
9 weeks ago
pablo ▴ 300

Hello,

I have 12 Illumina samples : 11 mutants and 1 wild. I did the SNP calling :

bcftools mpileup -Ou -f ref *.bam | bcftools call -mv -Ob -o calls.bcf

bcftools view -i '%QUAL>=20' calls.bcf 

bcftools convert -O v -o calls.vcf calls.bcf

I need to extract the SNP which are present in my mutants (at least present in 5 samples / 11), but not in my wild sample. For example (Sample1 = wild, I show 8 others = 8 mutants / 11) :

 CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 Sample9
    0  chr1  100  .   G   A    .      .    .     GT     0/1     1/1     0/1     0/0     0/0     0/0     0/0     0/0     0/0
    1  chr1  200 .    T   C    .      .    .     GT     0/0     0/0     0/0     0/0     0/1     0/0     0/0     0/0     0/0
    2  chr1  300  .   T   A    .      .    .     GT     0/0     0/1     0/0     0/0     0/1     0/0     1/1     0/1     0/1
    3  chr1  400  .   C   A    .      .    .     GT     0/0     1/1     1/1     1/1     1/1    1/1    1/1     1/1    1/1
    4  chr1  500  .   C   A    .      .    .     GT     0/0     1/1     1/1     1/1     0/0    1/1    1/1     1/1    1/1

I should get the last three lines : because the SNP is 0/0 with the reference, and at least present in 5 mutants (no matter if it is hom 1/1 or het 0/1).

Any tool to do that? Before trying to deal with a perl script..

Best

snp vcftools vcf bcftools • 483 views
ADD COMMENT
2
Entering edit mode
9 weeks ago

using vcffilterjdk: https://jvarkit.readthedocs.io/en/latest/VcfFilterJdk/

 java -jar jvarkit.jar vcffilterjdk -e 'final Set<String> samples=new HashSet<>(Arrays.asList("sample1","sample2","sample3","sample4","sample5","sample6","sample7","sample8","sample9","sample10")); return variant.getGenotypes().stream().filter(G->samples.contains(G.getSampleName())).filter(G->G.isHet() || G.isHomVar()).count()>4L && variant.getGenotypes().stream().filter(G->!samples.contains(G.getSampleName())).noneMatch(G->G.isHet() || G.isHomVar());' < in.vcf
ADD COMMENT
0
Entering edit mode

Thanks Pierre. I guess I need to change G.isHomVar()).count()>4L by the number I want to select the number of filtered samples?

ADD REPLY
0
Entering edit mode

Thanks Pierre. I guess I need to change G.isHomVar()).count()>4L by the number I want to select the number of filtered samples?

yes !

ADD REPLY
0
Entering edit mode

And does the sample1 in my case is considered as the wild?

ADD REPLY
0
Entering edit mode

anything that is not in ("sample1","sample2","sample3","sample4","sample5","sample6","sample7","sample8","sample9","sample10") is wild.

ADD REPLY
0
Entering edit mode

Thanks Pierre.

ADD REPLY
0
Entering edit mode

Please accept the answer so the question is marked solved on the website. To do that, click on the green check mark on the left side of the answer.

ADD REPLY

Login before adding your answer.

Traffic: 2122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6