Question: check genes from a list. print if match
gravatar for windsur
8 days ago by
windsur0 wrote:

Hello! I haven't seen any similar question, so:

I have a list of genes and several vcf files. What I would like to do is to check from the list of the genes in all vcf files from a dir, and if I get a match, return me in one table (e.g excel) with all the info line, the first columm should havethe name of the match file.

At the momment what I get is a filter script for each file, but I don't know how to check in a dir tree and return it all in a single table.

import sys
from glob import glob
from subprocess import call
from pandas import DataFrame

> gene_list = open("./genes_rp.txt",'r')
> gene_list = gene_list.readlines()[1:]
> final_list = list() for gene in gene_list:    
>     gene = gene.strip('\n').split('\t')   
>     final_list.append(gene[0].strip())
> sample_folder = glob(sys.argv[1] + '*prefiltered.txt')
> for sample_path in sample_folder[1:]:     
>     sample = open(sample_path, 'r')
>      sample = sample.readlines()
>   header = sample[0].strip('\n').split('\t')  
>  output = list()
>   output.append(header)
>   for variant in sample:      
>       variant = variant.strip('\n').split('\t')
>        variant_gene = variant[0]      
>       if variant_gene in final_list:
>         output.append(variant)
>   df = DataFrame(output)
>   df.to_excel(sample_path + '_rp.xlsx', sheet_name='sheet1', header = False,index=False)

The script above it will be usefull if you have a a vcf with a lot of genes and you wanna see only a few of them

genes list python vcf • 57 views
ADD COMMENTlink modified 8 days ago by Pierre Lindenbaum105k • written 8 days ago by windsur0
gravatar for Pierre Lindenbaum
8 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum105k wrote:

use the standard linux tools. Something like:

find /path/to/dir/ -type -name "*.vcf" | while read F ; do grep  -H -w -o -f  genes.txt $F | uniq ; done

and please, don't use Excel. Excel is bad

ADD COMMENTlink modified 8 days ago • written 8 days ago by Pierre Lindenbaum105k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour