bash script
3
0
Entering edit mode
2.7 years ago
priyanka ▴ 20

Hello everyone, I have a file like this: RSID1 RSID2

chr1_169894240_G_T_b38  chr1_169894240_G_T_b38
chr1_169894240_G_T_b38  chr1_169891332_G_A_b38
chr1_169891332_G_A_b38  chr1_169891332_G_A_b38
chr1_169661963_G_A_b38  chr1_169661963_G_A_b38
chr1_169661963_G_A_b38  chr1_169697456_A_T_b38
chr1_169697456_A_T_b38  chr1_169697456_A_T_b38
chr1_27636786_T_C_b38   chr1_27636786_T_C_b38
chr1_196651787_C_T_b38  chr1_196651787_C_T_b38
chr6_143501715_T_C_b38  chr6_143501715_T_C_b38

I want to extract info just like: chr1_169894240 chr1_169894240. I don't want to have other info. I just want chr_pos I am confuse how to extract this info because the length is varying. In one case its 9 length and in other its 10. So if i use cut command for some its showing write value like chr_pos but for some its showing chr_pos_ Can anyone please help me out with this.

info snp model substring • 1.4k views
ADD COMMENT
3
Entering edit mode
2.7 years ago
alex.zaccaron ▴ 410

You can use cut or awk with "_" as field separator character, e.g., cut -f 1 yourfile.txt | awk -v FS="_" {print $1"_"$2}. If you have a 2-column tsv file, you can try:

paste <(cut -f 1 yourfile.txt  | awk -v FS="_" '{print $1"_"$2}') <(cut -f 2 yourfile.txt  | awk -v FS="_" '{print $1"_"$2}')
ADD COMMENT
0
Entering edit mode

Thank you so much. It worked. Can you also share the link where I can learn in detail about awk command. I know just the basic of it

ADD REPLY
2
Entering edit mode
2.7 years ago
$ sed -r 's/_\w_\w_\w{3}//g' test.txt

$ awk -v OFS="\t" -F '[_\t]' '{print $1"_"$2,$6"_"$7}' test.txt

$ parallel --colsep "_|\t" echo {1}_{2} {6}_{7} :::: test.txt  | sed 's/\s/\t/'

chr1_169894240  chr1_169894240
chr1_169894240  chr1_169891332
chr1_169891332  chr1_169891332
chr1_169661963  chr1_169661963
chr1_169661963  chr1_169697456
chr1_169697456  chr1_169697456
chr1_27636786   chr1_27636786
chr1_196651787  chr1_196651787
chr6_143501715  chr6_143501715
ADD COMMENT
0
Entering edit mode

Thank you. I am able to do it by this command also

ADD REPLY
2
Entering edit mode
2.7 years ago

For the win, can even do a fancy regex with sed

cat data.tsv 
chr1_169894240_G_T_b38  chr1_169894240_G_T_b38
chr1_169894240_G_T_b38  chr1_169891332_G_A_b38
chr1_169891332_G_A_b38  chr1_169891332_G_A_b38
chr1_169661963_G_A_b38  chr1_169661963_G_A_b38
chr1_169661963_G_A_b38  chr1_169697456_A_T_b38
chr1_169697456_A_T_b38  chr1_169697456_A_T_b38
chr1_27636786_T_C_b38   chr1_27636786_T_C_b38
chr1_196651787_C_T_b38  chr1_196651787_C_T_b38
chr6_143501715_T_C_b38  chr6_143501715_T_C_b38

sed 's/_[ATGC]_[ATGC]_[a-z][0-9]*//g' data.tsv 
chr1_169894240  chr1_169894240
chr1_169894240  chr1_169891332
chr1_169891332  chr1_169891332
chr1_169661963  chr1_169661963
chr1_169661963  chr1_169697456
chr1_169697456  chr1_169697456
chr1_27636786   chr1_27636786
chr1_196651787  chr1_196651787
chr6_143501715  chr6_143501715

Kevin

ADD COMMENT
1
Entering edit mode

Thank you so much. this is quite easy to understand

ADD REPLY

Login before adding your answer.

Traffic: 3131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6