Question: how to grep word with hyphen/dash of A file in one column of B file
0
gravatar for Ming Lu
10 months ago by
Ming Lu0
Australia
Ming Lu0 wrote:

Hi, I have a A.bed file only with the gene name of these

  chr1-1
  chr1-10
  chr1-102
  chr1-106
  chr1-11
  chr1-2
  chr1-3

and I know they also in one column of B.bed .

chr1 startpos endpos chr1-1
chr1 startpos endpos chr1-10
chr1 startpos endpos chr1-102
chr1 startpos endpos chr1-106
chr1 startpos endpos chr1-11
chr1 startpos endpos chr1-2
chr1 startpos endpos chr1-3
chr2 startpos endpos chr2-234
chr12 startpos endpos chr12-23546

However, why

  cut -f4 B.bed > C.bed # only use the gene name column
  comm -1 -2 A.bed C.bed

find all of them, But

  grep -w -f A.bed B.bed

only find

  chr1-1
  chr1-2
  chr1-3

Because comm cannot show whole rows in B.bed.

How could I use grep to call all the matched rows in B.bed?

Or how could I call all the rows in B.bed file with matched words of one column using another file?

chip-seq • 859 views
ADD COMMENTlink modified 9 months ago • written 10 months ago by Ming Lu0

Are the A and C files sorted?

comm -1 -2 <(sort A.bed) <(sort C.bed)
ADD REPLYlink written 10 months ago by michael.ante2.7k

yes sorte have sorted, comm is right, grep cannot get right number

ADD REPLYlink modified 10 months ago • written 10 months ago by Ming Lu0
2

Sorry, should have read it better. Are there special characters in one of the files?

head A.bed | sed -n 'l'
head B.bed | sed -n 'l'

Have you tried the join command?

join -1 4 -2 1 B.bed A.bed
ADD REPLYlink written 10 months ago by michael.ante2.7k
2
gravatar for Ming Lu
9 months ago by
Ming Lu0
Australia
Ming Lu0 wrote:

find a good code can get all matched lines:654! without need of sorting first. `

 awk -F '\t' 'NR==FNR{a[$1]=$1;next}; ($1==a[$1]){print $0}' a.bed b.bed > new.bed

in b.bed 's order with b.bed's columns

 awk -F '\t' 'NR==FNR{a[$1]=$0;next}; ($1 in a){print a[$1]}' a.bed b.bed > new.bed

in b.bed 's order with a.bed's columns

ADD COMMENTlink modified 9 months ago • written 9 months ago by Ming Lu0
1
gravatar for mittu1602
10 months ago by
mittu1602150
India
mittu1602150 wrote:

If its ok for you to use awk, use the following command:

awk 'FNR==NR{a[$1]=$4;next}{if(a[$1]==""){a[$1]=0};printf "%s%s%s%s%s%s%s%s%s\n",$1,FS,$2,FS,$3,FS,$4,FS,a[$1]}' B.bed A.bed  > result1
ADD COMMENTlink written 10 months ago by mittu1602150
1
gravatar for cpad0112
10 months ago by
cpad01129.3k
India
cpad01129.3k wrote:

output:

$ grep -w -f ids.txt test.txt 
chr1    startpos    endpos  chr1-1
chr1    startpos    endpos  chr1-10
chr1    startpos    endpos  chr1-102
chr1    startpos    endpos  chr1-106
chr1    startpos    endpos  chr1-11
chr1    startpos    endpos  chr1-2
chr1    startpos    endpos  chr1-3

$ join  -1 1 -2 4 ids.txt test.txt 
chr1-1 chr1 startpos endpos
chr1-10 chr1 startpos endpos
chr1-102 chr1 startpos endpos
chr1-106 chr1 startpos endpos
chr1-11 chr1 startpos endpos
chr1-2 chr1 startpos endpos
chr1-3 chr1 startpos endpos

input:

$ cat ids.txt 
chr1-1
chr1-10
chr1-102
chr1-106
chr1-11
chr1-2
chr1-3

$ cat test.txt 
chr1    startpos    endpos  chr1-1
chr1    startpos    endpos  chr1-10
chr1    startpos    endpos  chr1-102
chr1    startpos    endpos  chr1-106
chr1    startpos    endpos  chr1-11
chr1    startpos    endpos  chr1-2
chr1    startpos    endpos  chr1-3
chr2    startpos    endpos  chr2-234
chr12   startpos    endpos  chr12-23546
ADD COMMENTlink modified 10 months ago • written 10 months ago by cpad01129.3k
1

You can modify the join output with

join -1 1 -2 4 -o 2.1,2.2,2.3,0 ids.txt test.txt | tr ' ' '\t'

The tr command replaces the standard white-space with a tab.

ADD REPLYlink written 10 months ago by michael.ante2.7k
1

Join supports tsv output natively. output from $ join -t $'\t' -1 1 -2 4 -o 2.1,2.2,2.3,0 ids.txt test.txt is = join -1 1 -2 4 -o 2.1,2.2,2.3,0 ids.txt test.txt | tr ' ' '\t'

ADD REPLYlink modified 10 months ago • written 10 months ago by cpad01129.3k
0
gravatar for Inquisitive8995
10 months ago by
Inquisitive899530 wrote:

Hi, Are the number of rows equal in both the files ? Try grep -Fwf A.bed B.bed > Output.txt

ADD COMMENTlink written 10 months ago by Inquisitive899530

not equal, A.bed has 645 rows, B.bed has 33024 rows. But all A.bed are from one column of B.bed.

I think maybe "-"dash break the -w limited string?

Tried your code, still cannot find the rest same gene with grep -Fwf

ADD REPLYlink modified 10 months ago • written 10 months ago by Ming Lu0

In your command "comm -1 -3 A.bed C.bed"

-1 will suppress column 1 (lines unique to FILE 1) -3 will suppress column 3 (lines that appear in both files)

When using -3 , you are actually suppressing the lines that match in A.bed and B.bed.

Please try using "comm -1 -2 A.bed B.bed"

ADD REPLYlink written 10 months ago by Inquisitive899530

just writing mistake not the focus.

ADD REPLYlink written 10 months ago by Ming Lu0
0
gravatar for EagleEye
10 months ago by
EagleEye5.7k
Sweden
EagleEye5.7k wrote:
grep -w -Ff File2.txt File1.txt > commonFile1File2.txt
ADD COMMENTlink written 10 months ago by EagleEye5.7k
0
gravatar for Ming Lu
10 months ago by
Ming Lu0
Australia
Ming Lu0 wrote:

Firstly, I change all "-" to "_", and only use the column I use for grep, but make no difference.

All 654 rows of moVDR1220 should be in 36551 rows of trytry.txt

as moVDR1220.txt is a result of

#first transform enhancer.txt to enhancer.bed (move name column such as chr1-10 from 1 to 4 )
#then
bedtools intersect -a enhancer.bed -b BBB.bed -wa | cut -f4 > moVDR1220.txt

and trytry.txt is the result of ( the wc -l of enhancer.txt, enhancer.bed, trytry.txt, trytry.cdt all 36551)

annotatePeaks.pl enhancer.txt hg19 -size 2000 -hist 10 -ghist -d 24hvitd/ 24heth/ > trytry.txt.
more trytry.txt|cut -f1> trytry.txt

so the grep or join or comm result should all be 654.

my data is:

homer $ more moVDR1220.txt|head
chr1_1
chr1_10
chr1_102
chr1_106
chr1_11
chr1_1140
chr1_115
chr1_12
chr1_123
chr1_14
homer$ more trytry.txt|head
Gene
chr1_1
chr1_10
chr1_100
chr1_1000
chr1_10000
chr1_10025
chr1_10028
chr1_10031
chr1_10037
homer$ grep -w -f moVDR1220.txt trytry.txt | wc -l
 180 
homer$ grep -w -f moVDR1220.txt trytry.txt | head
chr1_1
chr1_2
chr1_3
chr1_4 
chr1_5
chr1_6
chr1_75
chr1_76
chr1_8
chr1_9
homer$ join -1 1 -2 1 moVDR1220.txt trytry.txt | wc -l
 389
homer$ join -1 1 -2 1 moVDR1220.txt trytry.txt | head
chr1_1
chr1_10
chr1_102
chr1_106
chr1_11
chr1_1140
chr1_115
chr1_12
chr1_123
chr1_14
homer$ comm -1 -2 moVDR1220.txt trytry.txt| wc -l
 389

I know the problem now "-" didn't impact, a mistake in bedtools step.

But I still don;t know why grep cannot do this kind of thing.

ADD COMMENTlink modified 10 months ago • written 10 months ago by Ming Lu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 691 users visited in the last hour