Question: check genes from a list. print if match
0
gravatar for windsur
3 months ago by
windsur0
windsur0 wrote:

Hello! I haven't seen any similar question, so:

I have a list of genes and several vcf files. What I would like to do is to check from the list of the genes in all vcf files from a dir, and if I get a match, return me in one table (e.g excel) with all the info line, the first columm should havethe name of the match file.

At the momment what I get is a filter script for each file, but I don't know how to check in a dir tree and return it all in a single table.

import sys
from glob import glob
from subprocess import call
from pandas import DataFrame

> gene_list = open("./genes_rp.txt",'r')
> gene_list = gene_list.readlines()[1:]
> 
> final_list = list() for gene in gene_list:    
>     gene = gene.strip('\n').split('\t')   
>     final_list.append(gene[0].strip())
>  
> sample_folder = glob(sys.argv[1] + '*prefiltered.txt')
>  
> for sample_path in sample_folder[1:]:     
>     sample = open(sample_path, 'r')
>      sample = sample.readlines()
> 
>   header = sample[0].strip('\n').split('\t')  
>  output = list()
>   output.append(header)
> 
>   for variant in sample:      
>       variant = variant.strip('\n').split('\t')
>        variant_gene = variant[0]      
>       if variant_gene in final_list:
>         output.append(variant)
>  
>   df = DataFrame(output)
> 
>   df.to_excel(sample_path + '_rp.xlsx', sheet_name='sheet1', header = False,index=False)

The script above it will be usefull if you have a a vcf with a lot of genes and you wanna see only a few of them

genes list python vcf • 130 views
ADD COMMENTlink modified 3 months ago by Pierre Lindenbaum108k • written 3 months ago by windsur0
1
gravatar for Pierre Lindenbaum
3 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

use the standard linux tools. Something like:

find /path/to/dir/ -type -name "*.vcf" | while read F ; do grep  -H -w -o -f  genes.txt $F | uniq ; done

and please, don't use Excel. Excel is bad

ADD COMMENTlink modified 3 months ago • written 3 months ago by Pierre Lindenbaum108k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1609 users visited in the last hour