Extract Hard-clipped and soft-clipped values from fifth column
0
0
Entering edit mode
5.4 years ago
Ram ▴ 190

Hello all,

Is it possible to know how I can extract 105 hard-clipped base from fifth column and then add value 105 to end coordinate of chr1 ?

As I tried with awk, but no success !

chr1    21730812    21730857    M00758:777:BKR4B:1:2103:25646:13282  45M105H
chr1    179196680   179196716     M00758:777:BKR4B:1:2101:24687:24458     87H36M27H

Thanks a lot!

sequencing awk perl bash • 1.3k views
ADD COMMENT
1
Entering edit mode

This is your input :

chr1    21730812    21730857    M00758:777:BKR4B:1:2103:25646:13282    45M105H

chr1    179196680    179196716    M00758:777:BKR4B:1:2101:24687:24458    87H36M27H

And you want a result like this :

chr1    21730812    21730962    M00758:777:BKR4B:1:2103:25646:13282    45M105H

chr1    179196680    179196743    M00758:777:BKR4B:1:2101:24687:24458    87H36M27H

?

ADD REPLY
0
Entering edit mode

I want a result like this :

chr1   21730857     21730962    M00758:777:BKR4B:1:2103:25646:13282    45M105H

chr1    179196716    179196743    M00758:777:BKR4B:1:2101:24687:24458    87H36M27H

Thanks !

ADD REPLY
1
Entering edit mode

You mean for the first line :

chr1   21730812     21730962    M00758:777:BKR4B:1:2103:25646:13282    45M105H

Because 45M105H does not have hard clipping as start

ADD REPLY
0
Entering edit mode

Yes ! Any idea how i can be able to achieve this ?

ADD REPLY
0
Entering edit mode

Please explain your awk script as it does nothing to pick relevant parts of $5 or add to $2 or $3. I don't even see why a loop is necessary in your awk script.

Also, please edit your question and add your awk script in there.

ADD REPLY
0
Entering edit mode

You wish to add number of Matched based to start position and number of hard clipped to end position?

Why?

ADD REPLY
0
Entering edit mode

I tried with awk

show us the awk code

ADD REPLY
0
Entering edit mode

105 hard-clipped base from fifth column and then add value 105 to end coordinate of chr1 ?

do you really want to do this ??? how would you handle a 'N' or a 'D' in the cigar string ???

ADD REPLY
0
Entering edit mode

Here is the code :

awk -F'\t' 'NR==FNR{arr[$1]++;next}{for(i=1; i<=NF; i++) if ($i in arr){a[i]++;}} { for (i in a) printf "%s\t", $i; printf "\n"}'  file.bed
ADD REPLY
1
Entering edit mode

can you please explain this awk script ?

ADD REPLY
0
Entering edit mode

What is the larger problem that you are attempting to solve, and why only hard clips, not soft clips?

Wouldn't it be easier to just use a utility such as gridss.ComputeSamTags SOFTEN_HARD_CLIPS=true to convert your hard clips into soft clips then do you conversion to BED?

ADD REPLY
0
Entering edit mode

Hello Ram,

Please provide feedback to the comments here so we can get this discussion to closure.

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6