change ".." to tab or change a string of characters into different column
3
0
Entering edit mode
6.4 years ago
Ming Lu ▴ 30

I want to change a CTCF CHIA-PET bed file data to its two end bed file. how can I change

       chrX:154145949..154146870-chrX:154314445..154315402,2
       chrX:154208778..154209800-chrX:154376894..154377812,3
       chrX:154208831..154209797-chrX:154285357..154286294,4

into

      chrX 154145949 154146870
      chrX 154314445 154315402
      chrX 154208778 154209800
      chrX 154376894 154377812
      chrX 154208831 154209797
      chrX 154285357 154286294
ChIP-Seq • 1.1k views
ADD COMMENT
1
Entering edit mode
6.4 years ago

Hi,

Use this

sed 's/-/\n/g;s/:/\t/g;s/\.\./\t/g;s/,[0-9]//g' ./your_file > ./your_file_mod
ADD COMMENT
1
Entering edit mode
6.4 years ago

To create an unsorted BED file:

$ awk -v OFS="\t" '{n=split($0, a, /[:.\-,]/); printf("%s\t%s\t%s\n%s\t%s\t%s\n", a[1],a[2],a[4],a[5],a[6],a[8]);}' chia-pet.txt > chia-pet.unsorted.bed

To create a sorted BED file:

$ awk -v OFS="\t" '{n=split($0, a, /[:.\-,]/); printf("%s\t%s\t%s\n%s\t%s\t%s\n", a[1],a[2],a[4],a[5],a[6],a[8]);}' chia-pet.txt | sort-bed - > chia-pet.bed

Sorted BED files allow BEDOPS binaries to do set operations on BED files correctly and efficiently.

ADD COMMENT
1
Entering edit mode
6.4 years ago

output:

$ awk -v OFS="\t" '{gsub("-","\n"); gsub(/:|\../," ")}1' test.txt | sed 's/,.*//g' or $ sed 's/,.*//g' test.txt | tr ":" " " | tr ".." " " | tr "-" "\n"

chrX 154145949 154146870
chrX 154314445 154315402
chrX 154208778 154209800
chrX 154376894 154377812
chrX 154208831 154209797
chrX 154285357 154286294

input:

$ cat test.txt 
chrX:154145949..154146870-chrX:154314445..154315402,2
chrX:154208778..154209800-chrX:154376894..154377812,3
chrX:154208831..154209797-chrX:154285357..154286294,4
ADD COMMENT

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6