Question: How to remove lines with unmatched columns
0
gravatar for BehMah
2.1 years ago by
BehMah30
BehMah30 wrote:

Hi All, I have a bed file (annotation) with a bug throughout; the last 2 columns ARE NOT MATCHED in number of blocks ($5 has 4 blocks but $6 has 3) in some rows. How can I remove these lines having this unmatched columns. Thank you guys

  input:

  chr2   1627   4677   +     1,4,92,30    0,19,11
  chr2   2643   6698   +     10,42,9      0,14
  chr3   1327   4377   +     12,32        0,11
  chr4   4143   6698   +     64,43,23     0,24,51

  desired:

   chr3  1327   4377   +     12,32        0,11
   chr4  4143   6698   +     64,43,23     0,24,51
rna-seq sequence • 657 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by BehMah30

Dear BehMah. Could you please share with a few more lines of the original file and another snipped with desired result after data is fixed (make it manually). So we can understand exactly what you are looking for. Thank you.

ADD REPLYlink written 2.1 years ago by Petr Ponomarenko2.6k

More explenation:

I want to extract sequences of the coordinates but as Exon sizes ($5) are different from exon offsets($6) in numbers, bedtools doesn't give me all the sequences

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by BehMah30

Thank you all 5heikki ,Petr, jmzeng1314 for your awesome codes

ADD REPLYlink written 2.1 years ago by BehMah30
1
gravatar for jmzeng1314
2.1 years ago by
jmzeng131490
jmzeng131490 wrote:
perl -alne '{$tmp=tr/,//;print if $tmp %2==0}'  your.input >output
ADD COMMENTlink written 2.1 years ago by jmzeng131490
1
gravatar for Petr Ponomarenko
2.1 years ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.6k wrote:
awk '{if(gsub(",","",$5)==gsub(",","",$6)){print $0}}' input.txt

gsub returns number of substitutions it made

ADD COMMENTlink modified 2.1 years ago by genomax70k • written 2.1 years ago by Petr Ponomarenko2.6k
1
gravatar for 5heikki
2.1 years ago by
5heikki8.4k
Finland
5heikki8.4k wrote:

gsub returns the number of substitutions, so:

awk 'BEGIN{OFS=FS="\t"}{if(gsub(",",",",$5) == gsub(",",",",$6)){print $0}}' inputFile

edit. Petr Ponomarenko suggested the same, however, at least with my gawk his solution deletes the commas from $5 and $6

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by 5heikki8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 928 users visited in the last hour