Question

Consolidation Of Cnv Data

0

Entering edit mode

11.0 years ago

ruchiksy ▴ 50

I have two excel files containing data regarding schizophrenia.

File 1: Contains information about GWAS studies for schizophrenia and includes their CNV regions and associated genes. I built this by importing info from several databases.

File 2: Contains information about the increase in the expression of genes and also contains info about CNV regions and genes and so on.

My task is to find out if CNV regions and genes found in File 1 are present in File 2 and since there are about 3K entries I want to automate the process. Is there a script that I can write to read the two files in and display the duplicate entries? I just need a reference or pseudo-code if you will so that I can get started or atleast have an idea of how to proceed. If members can post the code in python or C++ it would be beneficial.

Thanks,

cnv genes • 2.2k views

ADD COMMENT • link updated 8.2 years ago by Biostar 20 • written 11.0 years ago by ruchiksy ▴ 50

score 4 · Answer 1 · 2013-04-15

You do not have to make new code to intersect genomic intervals in two files. You can perfectly use bedtools or bedops for this purpose. If you have no access to unix-like configured machines you can use the join-intervals tool from Galaxy. Also see How To Intersect Two Tracks In Ucsc Table Browser And Get Fields From Both?

You might have to re-format the columns you have in your Excel-file according to the desired input format for bedtools/bedops/join intervals and export/save your Excel-file as tab-delimited text file.