Question: check genes from a list. print if match
gravatar for windsur
2.2 years ago by
windsur20 wrote:

Hello! I haven't seen any similar question, so:

I have a list of genes and several vcf files. What I would like to do is to check from the list of the genes in all vcf files from a dir, and if I get a match, return me in one table (e.g excel) with all the info line, the first columm should havethe name of the match file.

At the momment what I get is a filter script for each file, but I don't know how to check in a dir tree and return it all in a single table.

import sys
from glob import glob
from subprocess import call
from pandas import DataFrame

> gene_list = open("./genes_rp.txt",'r')
> gene_list = gene_list.readlines()[1:]
> final_list = list() for gene in gene_list:    
>     gene = gene.strip('\n').split('\t')   
>     final_list.append(gene[0].strip())
> sample_folder = glob(sys.argv[1] + '*prefiltered.txt')
> for sample_path in sample_folder[1:]:     
>     sample = open(sample_path, 'r')
>      sample = sample.readlines()
>   header = sample[0].strip('\n').split('\t')  
>  output = list()
>   output.append(header)
>   for variant in sample:      
>       variant = variant.strip('\n').split('\t')
>        variant_gene = variant[0]      
>       if variant_gene in final_list:
>         output.append(variant)
>   df = DataFrame(output)
>   df.to_excel(sample_path + '_rp.xlsx', sheet_name='sheet1', header = False,index=False)

The script above it will be usefull if you have a a vcf with a lot of genes and you wanna see only a few of them

genes list python vcf • 487 views
ADD COMMENTlink modified 2.2 years ago by Pierre Lindenbaum128k • written 2.2 years ago by windsur20
gravatar for Pierre Lindenbaum
2.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

use the standard linux tools. Something like:

find /path/to/dir/ -type -name "*.vcf" | while read F ; do grep  -H -w -o -f  genes.txt $F | uniq ; done

and please, don't use Excel. Excel is bad

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Pierre Lindenbaum128k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1170 users visited in the last hour