separating dataset by a specific character
1
0
Entering edit mode
3.6 years ago
Meghan.T • 0

I downloaded a .bed file and converted it using this https://genome.ucsc.edu/cgi-bin/hgLiftOver now I want to load the converted file "datahg38.bed" in R using this code

   library(rtracklayer)
dataset<-import.bed("datahg38.bed")

and I get this error

error : $ operator is invalid for atomic vectors

any ideas how to fix this ?

so I found out that the output format is a list of 300,000 by 1 which should be a table of 300,000 of 3 variables

the format is something like this

chr1:10142-10351
chr1:10453-10563
chr1:13044-13104
.
.
.

so basically I need to read it as a table with read.table and then convert it to a matrix of 300,000 of 3 . and character : , - should be used to split data. I'm looking for an output like this

 chr1  10142  10351
 chr1  10453  10563
 chr1  13044  13104
 ...

so can anyone please suggest an efficient way to separate this data?

R • 615 views
ADD COMMENT
0
Entering edit mode

Why not convert the bed file to a more suitable format using linux? Something like

cat datahg38.bed | tr "\:" "\t" | tr "-" "\t" > datahg38_format.bed
ADD REPLY
0
Entering edit mode
3.6 years ago
JC 13k

Basically your BED file is not a BED file, so you need to convert it:

perl -pe 's/:/\t/; s/-/\t/' < datahg38.bed > datahg38_realBED.bed
ADD COMMENT

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6