Vcf : remove variant absent in affected sample
2
1
Entering edit mode
2.1 years ago
quentin54520 ▴ 120

Hello,

I have a multi sample vcf with affected and unaffected samples. Usually i remove variant absent in affected samples After annotations and conversion of vcf to rdv file, through a python script, but it's a bit useless to annotate all these variants to remove it immediatly After.

So i'm searching a tool to filter directly the vcf before annotations. I look on bcftools or vcftools but i didn't find the option for what i want.

Vcf • 1.2k views
ADD COMMENT
2
Entering edit mode
2.1 years ago

using vcffilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar dist/vcffilterjdk.jar  -e ' final List<String> controls =  Arrays.asList("S1","S2","S3","S4"); return controls.stream().map(S->variant.getGenotype(S)).noneMatch(G->G.isHet() || G.isHomVar());' in.vcf
ADD COMMENT
0
Entering edit mode

Thank you for your reply. I'm not sure I understand the command, I have to replace the ("S1","S2","S3","S4") by the list of my unaffected samples?

ADD REPLY
0
Entering edit mode

by the list of my unaffected samples?

yes

ADD REPLY
0
Entering edit mode

Unfortunately when i try to install it with:

git clone "https://github.com/lindenb/jvarkit.git"

cd jvarkit

./gradlew vcffilterjdk

i get the error:

Exception in thread "main" java.lang.RuntimeException: Could not create parent directory for lock file /shared/home/quentin67100/.gradle/wrapper/dists/gradle-6.7-bin/efvqh8uyq79v2n7rcncuhu9sv/gradle-6.7-bin.zip.lck at org.gradle.wrapper.ExclusiveFileAccessManager.access(ExclusiveFileAccessManager.java:43) at org.gradle.wrapper.Install.createDist(Install.java:48) at org.gradle.wrapper.WrapperExecutor.execute(WrapperExecutor.java:107) at org.gradle.wrapper.GradleWrapperMain.main(GradleWrapperMain.java:63)

ADD REPLY
1
Entering edit mode

do you have the right to create directories under /shared/home/quentin67100/.gradle ?

ADD REPLY
0
Entering edit mode

You're right, it's an issue with space disk quota on my home.

ADD REPLY
0
Entering edit mode
2.1 years ago
sbstevenlee ▴ 480

It sounds like you are already familiar with Python, so here's one solution using the pyvcf submodule from the fuc package I wrote.

Let's imagine you have 3 controls (C1-C3) and 3 affected samples (A1-A3).

>>> from fuc import pyvcf
>>> data = {
...     'CHROM': ['chr1', 'chr1', 'chr1'],
...     'POS': [100, 101, 102],
...     'ID': ['.', '.', '.'],
...     'REF': ['G', 'T', 'T'],
...     'ALT': ['A', 'C', 'A'],
...     'QUAL': ['.', '.', '.'],
...     'FILTER': ['.', '.', '.'],
...     'INFO': ['.', '.', '.'],
...     'FORMAT': ['GT', 'GT', 'GT'],
...     'C1': ['0/1', '0/1', '0/0'],
...     'C2': ['0/0', '1/1', '0/0'],
...     'C3': ['0/1', '0/1', '0/0'],
...     'A1': ['0/0', '0/1', '1/1'],
...     'A2': ['0/0', '1/1', '0/1'],
...     'A3': ['0/0', '0/0', '0/1'],
... }
>>> vf = pyvcf.VcfFrame.from_dict([], data)
>>> # vf = pyvcf.VcfFrame.from_file('in.vcf')
>>> vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT   C1   C2   C3   A1   A2   A3
0  chr1  100  .   G   A    .      .    .     GT  0/1  0/0  0/1  0/0  0/0  0/0
1  chr1  101  .   T   C    .      .    .     GT  0/1  1/1  0/1  0/1  1/1  0/0
2  chr1  102  .   T   A    .      .    .     GT  0/0  0/0  0/0  1/1  0/1  0/1

You can remove variants that are absent in the affected.

>>> filtered_vf = vf.filter_sampany(['A1', 'A2', 'A3'])
>>> filtered_vf.df
  CHROM  POS ID REF ALT QUAL FILTER INFO FORMAT   C1   C2   C3   A1   A2   A3
0  chr1  101  .   T   C    .      .    .     GT  0/1  1/1  0/1  0/1  1/1  0/0
1  chr1  102  .   T   A    .      .    .     GT  0/0  0/0  0/0  1/1  0/1  0/1

Optionally write the VCF data to an output file.

# filtered_vf.to_file('out.vcf')

Let me know if you have any questions.

ADD COMMENT

Login before adding your answer.

Traffic: 2770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6