Question

How to convert my files(cnv seg, refseq) to .bed format ?

0

Entering edit mode

8.4 years ago

taegyunlee • 0

Hi

I had downloaded TCGA CNV level 3 data(nocnv, hg19). I hope to map this CNV data to each genes. So, I had searched information about this issue and I could find some.

I got recommendation, using the bedtools.

I had downloaded refseq file from UCSC table browser. refseq file content is as follows.

bin  name       chrom  strand  txStart    txEnd      cdsStart   cdsEnd     exonCount  exonStarts
3    NR_130130  chr1   +       150980866  151008189  151008189  151008189  4          150980866,150997990,150999708,151006281,
3    NR_130132  chr1   +       150980866  151008189  151008189  151008189  4          150980866,150990287,150999708,151006281,

and cnv seg file's content is as follows.

Sample                                                 Chromosome  Start      End        Num_Probes  Segment_Mean
BREAD_p_TCGAb_430_431_NSP_GenomeWideSNP_6_D11_1538030  1           3218610    247813706  128998      0.0014
BREAD_p_TCGAb_430_431_NSP_GenomeWideSNP_6_D11_1538030  2           484222     207696262  110158      0.0067
BREAD_p_TCGAb_430_431_NSP_GenomeWideSNP_6_D11_1538030  2           207696273  207701151  2           -1.5215

As far as I know, I have to convert to my files(cnv seg, refseq) to .bed format. But I don't know how to deal it. What should I do?

Can you give me a hand?

seg cnv bed • 4.0k views

ADD COMMENT • link updated 24 months ago by Ram 45k • written 8.4 years ago by taegyunlee • 0

0

Entering edit mode

I am trying the same thing. But having diffculty in converting the seg.txt to a proper .bed format and hence the files could not be read at subsequent steps. Can you help me out on how to proceed with this? How have you managed to get the conversion done?

ADD REPLY • link 6.8 years ago by r.bhowmick • 0

score 0 · Answer 1 · 2017-03-07

Are you downloading gene annotations from UCSC? In the table browser, look for the page/option to export a table in another format, i.e. BED. You don't necessarily need to use the files as they are on the FTP site.

The general answer here is that these are all tabular formats, so you can extract the columns you need using standard Unix tools or a short script in R or Python. The format of BED is chromosome/start/end, while the UCSC RefSeq table and and the SEG format both have these columns along with others. So you select the chromosome, start, and end columns from the input format using cut or awk, and subtract 1 from the 'start' position for SEG because SEG uses 1-based indexing while BED and RefSeq use 0-based indexing.