read.table function loss of columns in bed file
1
0
Entering edit mode
7.1 years ago

I am trying to read data from a bedfile into R, but I lose the last two columns when the read.table function is applied.

My bedfile has 6 columns:

$ head CHX_clean_sorted_data_plus.bed 
chr1    630725  630752  SN1052:386:C9VFHACXX:1:1305:11494:98114#RB:GTGCC    50  +
chr1    630728  630752  SN1052:386:C9VFHACXX:1:1116:5611:86646#RB:GAGTG 50  +
chr1    630728  630751  SN1052:386:C9VFHACXX:1:1213:17960:21112#RB:TTAGA    50  +
chr1    630728  630752  SN1052:386:C9VFHACXX:1:1312:11292:52265#RB:TCAAA    50  +
chr1    634005  634030  SN1052:386:C9VFHACXX:1:1110:17705:92051#RB:GTGCG    50  +
chr1    634337  634367  SN1052:386:C9VFHACXX:1:2102:4448:4217#RB:TTGGA  50  +

The final two columns are lost when I use read.table to read the data into R

a <- read.table("CHX_clean_sorted_data_plus.bed", sep="\t", blank.lines.skip=FALSE)
dim(a)
[1] 144712      4

head(a)
    V1     V2     V3                                      V4
1 chr1 630725 630752 SN1052:386:C9VFHACXX:1:1305:11494:98114
2 chr1 630728 630752  SN1052:386:C9VFHACXX:1:1116:5611:86646
3 chr1 630728 630751 SN1052:386:C9VFHACXX:1:1213:17960:21112
4 chr1 630728 630752 SN1052:386:C9VFHACXX:1:1312:11292:52265
5 chr1 634005 634030 SN1052:386:C9VFHACXX:1:1110:17705:92051
6 chr1 634337 634367   SN1052:386:C9VFHACXX:1:2102:4448:4217

I would appreciate any insight into what is causing this and how to fix it.

-Lauren

R • 1.8k views
ADD COMMENT
3
Entering edit mode

See read.table does not read in all rows! for a potential solution. The problem in your case seems to be the '#' sign.

ADD REPLY
2
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

Stealing the screenshot for future use since you have already done the effort :)

ADD REPLY
1
Entering edit mode

Don't forget to cite me then, every time you use it ;) Having this as a standard moderation answer would be convenient :-p

ADD REPLY
1
Entering edit mode

Since @Istvan has finally completed the ChIP-seq chapter in Biostars handbook he may have time to revisit our Biostars feature wish list.

ADD REPLY
0
Entering edit mode

Thanks so much for your help. This worked great!

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

ADD REPLY
2
Entering edit mode
7.1 years ago

Your solution is changing the comment.char parameter of the read.table function. By default, it reads # as comment character, which is in 99% of the situations true. If you really know that you need the last two columns even if they're "commented", then switch the comment.char to something else that is not in your file. For example the ampersand "&" or the dollar "$", but check that they're not in the file before setting them.

read.table("CHX_clean_sorted_data_plus.bed", sep="\t", comment.char="&", blank.lines.skip=FALSE)
ADD COMMENT
2
Entering edit mode

If not needed (like in this case), it can also be turned off by setting it to empty string comment.char=""

ADD REPLY
0
Entering edit mode

Thanks, I didn't know that. I thought it was gonna screw up the parsing!

ADD REPLY

Login before adding your answer.

Traffic: 4027 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6