string replacement from one file to another
1
0
Entering edit mode
4.0 years ago

Hi all, I need your help.

I am trying to go from ucsc name convention to the ensembl one. I have a .bed file with an annotation and I have a .txt file with the convention equivalence in 2 different columns. The files look like:

  1. .bed
    chr2L 324 453
    chr3R 65433 73563
    chr4 5345 9854
    ... etc

  2. .txt equivalence
    chr2L 2L
    chr3L 3L
    chr4 4L
    ... etc

I know you can use sed 's/chr2L/2L/g' for replacing the patterns. However, doing it for all the chromosomes and scaffolds (approximately 2000 different ones) is not feasible.

I am looking for a script (I don't mind the programming language) or a tool that works as:

Read the equivalence file, store the strings. Read the .bed file and be able to perform the string replacement in the chromosome field.

Thank you in advance, have a great day! Best,

Jordi

bash pyhton Assembly • 713 views
ADD COMMENT
0
Entering edit mode

using tsv-utils :

tsv-join -f test.bed test.txt --key-fields 1 --append-fields 2,3 | awk -v OFS="\t" '{print $3,$4,$2}'
324 453 2L
5345    9854    4L

with awk:

awk -v OFS="\t" 'NR==FNR {a[$1]=$1"\t"$2;next} ($1 in a) {print $2,$3,a[$1]}' test.txt test.bed | awk '{print $1,$2,$4}'     

324 453 2L
5345 9854 4L
ADD REPLY
0
Entering edit mode

Tank you as well! AWK is awesome and terribly powerful. I will spend some time and try to master it

ADD REPLY
3
Entering edit mode
4.0 years ago
join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file2.tsv) <(sort -t $'\t' -k1,1 file1.tsv) | cut -f 2-

otherwise, I wrote a tool to substitute the chromosomes' names. : http://lindenb.github.io/jvarkit/ConvertBedChromosomes.html

ADD COMMENT
0
Entering edit mode

Thank you so much, I will give it a try to both the command and the tool!

ADD REPLY

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6