Question: How to remove lines with unmatched columns
0
gravatar for BehMah
20 months ago by
BehMah30
BehMah30 wrote:

Hi All, I have a bed file (annotation) with a bug throughout; the last 2 columns ARE NOT MATCHED in number of blocks ($5 has 4 blocks but $6 has 3) in some rows. How can I remove these lines having this unmatched columns. Thank you guys

  input:

  chr2   1627   4677   +     1,4,92,30    0,19,11
  chr2   2643   6698   +     10,42,9      0,14
  chr3   1327   4377   +     12,32        0,11
  chr4   4143   6698   +     64,43,23     0,24,51

  desired:

   chr3  1327   4377   +     12,32        0,11
   chr4  4143   6698   +     64,43,23     0,24,51
rna-seq sequence • 577 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by BehMah30

Dear BehMah. Could you please share with a few more lines of the original file and another snipped with desired result after data is fixed (make it manually). So we can understand exactly what you are looking for. Thank you.

ADD REPLYlink written 20 months ago by Petr Ponomarenko2.6k

More explenation:

I want to extract sequences of the coordinates but as Exon sizes ($5) are different from exon offsets($6) in numbers, bedtools doesn't give me all the sequences

ADD REPLYlink modified 20 months ago • written 20 months ago by BehMah30

Thank you all 5heikki ,Petr, jmzeng1314 for your awesome codes

ADD REPLYlink written 20 months ago by BehMah30
1
gravatar for jmzeng1314
20 months ago by
jmzeng131490
jmzeng131490 wrote:
perl -alne '{$tmp=tr/,//;print if $tmp %2==0}'  your.input >output
ADD COMMENTlink written 20 months ago by jmzeng131490
1
gravatar for Petr Ponomarenko
20 months ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.6k wrote:
awk '{if(gsub(",","",$5)==gsub(",","",$6)){print $0}}' input.txt

gsub returns number of substitutions it made

ADD COMMENTlink modified 20 months ago by genomax62k • written 20 months ago by Petr Ponomarenko2.6k
1
gravatar for 5heikki
20 months ago by
5heikki8.1k
Finland
5heikki8.1k wrote:

gsub returns the number of substitutions, so:

awk 'BEGIN{OFS=FS="\t"}{if(gsub(",",",",$5) == gsub(",",",",$6)){print $0}}' inputFile

edit. Petr Ponomarenko suggested the same, however, at least with my gawk his solution deletes the commas from $5 and $6

ADD COMMENTlink modified 20 months ago • written 20 months ago by 5heikki8.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1598 users visited in the last hour