Question: Modify BED with poliregions
0
gravatar for v82masae
8 months ago by
v82masae140
v82masae140 wrote:

I have a somewhat tricky BED file format, which I should convert to a classic BED format so as I can properly use it for further steps:

I have this unconventional BED format:

1   12349   12398   +
1   23523   23578   -
1   23550;23570;23590   23640;23689;23652   +
1   43533   43569   +
1   56021;56078   56099;56155   +

Say that those multiple position rows are representing non-coding fragmented regions.

What I would like to get is a cannonical BED file such as:

1   12349   12398   +
1   23523   23578   -
1   23550   23640   +
1   23570   23689   +
1   23590   23652   +
1   43533   43569   +
1   56021   56099   +
1   56078   56155   +

where the poliregions that were mixed in one row, are put in other rows, while mantaining chromosome number and strand.

I have been struggling with a proper way to do this for a while...

Could anyone help?

Thanks

awk bash bed • 216 views
ADD COMMENTlink modified 8 months ago by Pierre Lindenbaum124k • written 8 months ago by v82masae140
2
gravatar for Pierre Lindenbaum
8 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:
awk '{N=split($2,a,/[;]/);split($3,b,/[;]/);for(i=1;i<=N;i++) printf("%s\t%s\t%s\t%s\n",$1,a[i],b[i],$4);}' input.bed

1   12349   12398   +
1   23523   23578   -
1   23550   23640   +
1   23570   23689   +
1   23590   23652   +
1   43533   43569   +
1   56021   56099   +
1   56078   56155   +
ADD COMMENTlink written 8 months ago by Pierre Lindenbaum124k
1
gravatar for ATpoint
8 months ago by
ATpoint26k
Germany
ATpoint26k wrote:

Good luck. In the last line test.pseudobed is the input file. I called it pseudobed because your input is not in BED format. Column 4 in BED is a name, the strand is in column 6. The code snippet will take care of it, producing a standard BED with 6 columns, leaving the 4th and 5th with a . as spaceholder. If you do not want that, simply remove the ".", "." part from the awk commands.

while read i; do
  if [[ $(echo $i | tr " " "\t" | grep -c ';' /dev/stdin) > 0 ]]; then
    CHR="$(echo $i | tr " " "\t" | cut -f1)"
    STR="$(echo $i | tr " " "\t" | cut -f4)"
    paste \
    <(echo $i | tr " " "\t" | cut -f2 | tr ";" "\n") \
    <(echo $i | tr " " "\t" | cut -f3 | tr ";" "\n") | \
    awk -v chr=$CHR -v str=$STR 'OFS="\t" {print chr, $1, $2, ".", ".", str}'
  else
    echo $i | tr " " "\t" | awk 'OFS="\t" {print $1, $2, $3, ".", ".", $4}'
  fi
  done < test.pseudobed

1   12349   12398   .   .   +
1   23523   23578   .   .   -
1   23550   23640   .   .   +
1   23570   23689   .   .   +
1   23590   23652   .   .   +
1   43533   43569   .   .   +
1   56021   56099   .   .   +
1   56078   56155   .   .   +
ADD COMMENTlink modified 8 months ago • written 8 months ago by ATpoint26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1756 users visited in the last hour