Consolidation Of Cnv Data
1
0
Entering edit mode
11.0 years ago
ruchiksy ▴ 50

I have two excel files containing data regarding schizophrenia.

File 1: Contains information about GWAS studies for schizophrenia and includes their CNV regions and associated genes. I built this by importing info from several databases.

File 2: Contains information about the increase in the expression of genes and also contains info about CNV regions and genes and so on.

My task is to find out if CNV regions and genes found in File 1 are present in File 2 and since there are about 3K entries I want to automate the process. Is there a script that I can write to read the two files in and display the duplicate entries? I just need a reference or pseudo-code if you will so that I can get started or atleast have an idea of how to proceed. If members can post the code in python or C++ it would be beneficial.

Thanks,

cnv genes • 2.2k views
ADD COMMENT
4
Entering edit mode
11.0 years ago
Irsan ★ 7.8k

You do not have to make new code to intersect genomic intervals in two files. You can perfectly use bedtools or bedops for this purpose. If you have no access to unix-like configured machines you can use the join-intervals tool from Galaxy. Also see How To Intersect Two Tracks In Ucsc Table Browser And Get Fields From Both?

You might have to re-format the columns you have in your Excel-file according to the desired input format for bedtools/bedops/join intervals and export/save your Excel-file as tab-delimited text file.

ADD COMMENT

Login before adding your answer.

Traffic: 2394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6