Question: Extract Hard-clipped and soft-clipped values from fifth column
0
gravatar for Ram
7 months ago by
Ram130
Germany
Ram130 wrote:

Hello all,

Is it possible to know how I can extract 105 hard-clipped base from fifth column and then add value 105 to end coordinate of chr1 ?

As I tried with awk, but no success !

chr1    21730812    21730857    M00758:777:BKR4B:1:2103:25646:13282  45M105H
chr1    179196680   179196716     M00758:777:BKR4B:1:2101:24687:24458     87H36M27H

Thanks a lot!

bash awk sequencing perl • 323 views
ADD COMMENTlink modified 7 months ago by Pierre Lindenbaum121k • written 7 months ago by Ram130
1

This is your input :

chr1    21730812    21730857    M00758:777:BKR4B:1:2103:25646:13282    45M105H

chr1    179196680    179196716    M00758:777:BKR4B:1:2101:24687:24458    87H36M27H

And you want a result like this :

chr1    21730812    21730962    M00758:777:BKR4B:1:2103:25646:13282    45M105H

chr1    179196680    179196743    M00758:777:BKR4B:1:2101:24687:24458    87H36M27H

?

ADD REPLYlink written 7 months ago by Bastien Hervé4.4k

I want a result like this :

chr1   21730857     21730962    M00758:777:BKR4B:1:2103:25646:13282    45M105H

chr1    179196716    179196743    M00758:777:BKR4B:1:2101:24687:24458    87H36M27H

Thanks !

ADD REPLYlink modified 7 months ago by genomax69k • written 7 months ago by Ram130
1

You mean for the first line :

chr1   21730812     21730962    M00758:777:BKR4B:1:2103:25646:13282    45M105H

Because 45M105H does not have hard clipping as start

ADD REPLYlink modified 7 months ago • written 7 months ago by Bastien HervĂ©4.4k

Yes ! Any idea how i can be able to achieve this ?

ADD REPLYlink written 7 months ago by Ram130

Please explain your awk script as it does nothing to pick relevant parts of $5 or add to $2 or $3. I don't even see why a loop is necessary in your awk script.

Also, please edit your question and add your awk script in there.

ADD REPLYlink written 7 months ago by RamRS22k

You wish to add number of Matched based to start position and number of hard clipped to end position?

Why?

ADD REPLYlink written 7 months ago by RamRS22k

I tried with awk

show us the awk code

ADD REPLYlink written 7 months ago by Pierre Lindenbaum121k

105 hard-clipped base from fifth column and then add value 105 to end coordinate of chr1 ?

do you really want to do this ??? how would you handle a 'N' or a 'D' in the cigar string ???

ADD REPLYlink written 7 months ago by Pierre Lindenbaum121k

Here is the code :

awk -F'\t' 'NR==FNR{arr[$1]++;next}{for(i=1; i<=NF; i++) if ($i in arr){a[i]++;}} { for (i in a) printf "%s\t", $i; printf "\n"}'  file.bed
ADD REPLYlink modified 7 months ago by genomax69k • written 7 months ago by Ram130
1

can you please explain this awk script ?

ADD REPLYlink written 7 months ago by Pierre Lindenbaum121k

What is the larger problem that you are attempting to solve, and why only hard clips, not soft clips?

Wouldn't it be easier to just use a utility such as gridss.ComputeSamTags SOFTEN_HARD_CLIPS=true to convert your hard clips into soft clips then do you conversion to BED?

ADD REPLYlink written 7 months ago by d-cameron2.0k

Hello Ram,

Please provide feedback to the comments here so we can get this discussion to closure.

Thank you!

ADD REPLYlink written 7 months ago by RamRS22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 621 users visited in the last hour