Remove 2nd colon and rest of the values in a dataframe using R/ Unix
3
1
Entering edit mode
3.0 years ago
salman_96 ▴ 70

Hi

I have a file with coordinates like this

1:834573:A:AT
1:834830:G:A
1:835092:T:G
1:842388:T:TCCGCAGGA

I want to remove second colon and everything after that, such that the file looks like this

1:834573
1:834830
1:835092
1:842388

I have tried sed but the files have uneven characters.

Kindly suggest something

coordinates SNP Unix R • 1.1k views
ADD COMMENT
3
Entering edit mode
3.0 years ago
4galaxy77 2.8k

No need to use R for this - it's exactly what the cut command in unix was designed for.

❯ cat file.txt | cut -d':' -f1-2
1:834573
1:834830
1:835092
1:842388
ADD COMMENT
1
Entering edit mode
3.0 years ago
awk -v OFS="\t" -F ":" '{print $1,$2}' test.txt
ADD COMMENT
1
Entering edit mode
3.0 years ago

R code that assumes your coordinates are in file test.txt. I also kept your empty spacer lines.

gsub("(\\d+\\:\\d+)\\:[AGCT]+\\:[AGCT]+","\\1",readLines('test.txt'))
[1] "1:834573" ""         "1:834830" ""         "1:835092" ""         "1:842388"

If you want to save it as a new file just wrap it with writeLines(con="new_file_name.txt") :

writeLines(gsub("(\\d+\\:\\d+)\\:[AGCT]+\\:[AGCT]+","\\1",readLines('test.txt')),con='output_test.txt')
ADD COMMENT
0
Entering edit mode

Actually you could even simplify it to:

gsub("(\\d+\\:\\d+).*","\\1",readLines('test.txt'))
ADD REPLY
0
Entering edit mode

We can use ":" as delimiter, and avoid regex:

write.table(
  read.table(text = "
             1:834573:A:AT
             1:834830:G:A
             1:835092:T:G
             1:842388:T:TCCGCAGGA", sep = ":")[, 1:2],
  file = "out.txt", col.names = FALSE, row.names = FALSE, 
  quote = FALSE, sep = ":")
ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6