Check all pathogenics variants
1
0
Entering edit mode
2.9 years ago
vpsev3 ▴ 20

Hello,

I have sequenced my genome (WGS 30X) and I have my files (BAM, FASTQ, VCF), what is the most suitable technique to check all the pathogenic variants at once (indel, duplications, known variants etc.)?

I tried to compare my VCF file with ClinVar, without success because none software that I found on Github is working

software • 958 views
ADD COMMENT
0
Entering edit mode

without success because none software that I found on Github is working

wow !

ADD REPLY
0
Entering edit mode

On Windows anyway, there are always errors

ADD REPLY
0
Entering edit mode

Which errors, and please elaborate further on what you have found. It should be no issue at all to compare your input VCF to a ClinVar VCF.

ADD REPLY
0
Entering edit mode

You need special hardware/software for bioinformatics. Usually, it means "not Windows" + at least 32GB of RAM + large HDD to keep all the databases + multi-core processor. Then these tools from github will work.

ADD REPLY
0
Entering edit mode
16 months ago
Alban Nabla ▴ 30

Here is a simple method, using only Python:

from cyvcf2 import VCF
cv = VCF("clinvar.vcf.gz") #get file from ClinVar and make sure you have the .tbi file too
gen = VCF("yourGenome.vcf.gz") #get this from your WGS provider with the .tbi file

def compare_vcf(cv, usr):
    variants = {}
    try:
        cvv = next(cv)
        usrv = next(usr)
    except StopIteration:
        return variants
    while True: 
        if cvv.POS > usrv.POS:
            try:
                usrv = next(usr)
            except StopIteration:
                return variants
        if cvv.POS < usrv.POS:
            try:
                cvv = next(cv)
            except StopIteration:
                return variants
        if cvv.POS == usrv.POS:
            if cvv.REF == usrv.REF and cvv.ALT == usrv.ALT:
                variants[cvv.ID] = [cvv.POS, cvv.REF, cvv.ALT, cvv.INFO.get('CLNSIG')]
            try:            
                cvv = next(cv)
            except StopIteration:
                return variants
    return variants

import pandas as pd
chroms = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,'X','Y'] 
for i in chroms:
    cv_chrom = cv(str(i))
    gen_chrom = gen('chr'+str(i)) #make sure the format is correct for your genome 
    variants = compare_vcf(cv_chrom, gen_chrom)
    output = pd.DataFrame.from_dict(variants, orient='index', columns=['POS', 'REF', 'ALT', 'CLNSIG'])
    output.to_csv('chr'+str(i)+'.csv') 
ADD COMMENT

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6