Question: Consolidation Of Cnv Data
gravatar for ruchiksy
6.6 years ago by
ruchiksy50 wrote:

I have two excel files containing data regarding schizophrenia.

File 1: Contains information about GWAS studies for schizophrenia and includes their CNV regions and associated genes. I built this by importing info from several databases.

File 2: Contains information about the increase in the expression of genes and also contains info about CNV regions and genes and so on.

My task is to find out if CNV regions and genes found in File 1 are present in File 2 and since there are about 3K entries I want to automate the process. Is there a script that I can write to read the two files in and display the duplicate entries? I just need a reference or pseudo-code if you will so that I can get started or atleast have an idea of how to proceed. If members can post the code in python or C++ it would be beneficial.


genes cnv • 1.6k views
ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 6.6 years ago by ruchiksy50
gravatar for Irsan
6.6 years ago by
Irsan7.0k wrote:

You do not have to make new code to intersect genomic intervals in two files. You can perfectly use bedtools or bedops for this purpose. If you have no access to unix-like configured machines you can use the join-intervals tool from Galaxy. Also see How To Intersect Two Tracks In Ucsc Table Browser And Get Fields From Both?

You might have to re-format the columns you have in your Excel-file according to the desired input format for bedtools/bedops/join intervals and export/save your Excel-file as tab-delimited text file.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Irsan7.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2076 users visited in the last hour