Question: how to grep word with hyphen/dash of A file in one column of B file
0
gravatar for Ming Lu
4 weeks ago by
Ming Lu0
Australia
Ming Lu0 wrote:

Hi, I have a A.bed file only with the gene name of these

  chr1-1
  chr1-10
  chr1-102
  chr1-106
  chr1-11
  chr1-2
  chr1-3

and I know they also in one column of B.bed .

chr1 startpos endpos chr1-1
chr1 startpos endpos chr1-10
chr1 startpos endpos chr1-102
chr1 startpos endpos chr1-106
chr1 startpos endpos chr1-11
chr1 startpos endpos chr1-2
chr1 startpos endpos chr1-3
chr2 startpos endpos chr2-234
chr12 startpos endpos chr12-23546

However, why

  cut -f4 B.bed > C.bed # only use the gene name column
  comm -1 -2 A.bed C.bed

find all of them, But

  grep -w -f A.bed B.bed

only find

  chr1-1
  chr1-2
  chr1-3

Because comm cannot show whole rows in B.bed.

How could I use grep to call all the matched rows in B.bed?

Or how could I call all the rows in B.bed file with matched words of one column using another file?

chip-seq • 367 views
ADD COMMENTlink modified 15 days ago • written 4 weeks ago by Ming Lu0

Are the A and C files sorted?

comm -1 -2 <(sort A.bed) <(sort C.bed)
ADD REPLYlink written 4 weeks ago by michael.ante2.0k

yes sorte have sorted, comm is right, grep cannot get right number

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ming Lu0
2

Sorry, should have read it better. Are there special characters in one of the files?

head A.bed | sed -n 'l'
head B.bed | sed -n 'l'

Have you tried the join command?

join -1 4 -2 1 B.bed A.bed
ADD REPLYlink written 4 weeks ago by michael.ante2.0k
2
gravatar for Ming Lu
15 days ago by
Ming Lu0
Australia
Ming Lu0 wrote:

find a good code can get all matched lines:654! without need of sorting first. `

 awk -F '\t' 'NR==FNR{a[$1]=$1;next}; ($1==a[$1]){print $0}' a.bed b.bed > new.bed

in b.bed 's order with b.bed's columns

 awk -F '\t' 'NR==FNR{a[$1]=$0;next}; ($1 in a){print a[$1]}' a.bed b.bed > new.bed

in b.bed 's order with a.bed's columns

ADD COMMENTlink modified 14 days ago • written 15 days ago by Ming Lu0
1
gravatar for mittu1602
4 weeks ago by
mittu1602130
mittu1602130 wrote:

If its ok for you to use awk, use the following command:

awk 'FNR==NR{a[$1]=$4;next}{if(a[$1]==""){a[$1]=0};printf "%s%s%s%s%s%s%s%s%s\n",$1,FS,$2,FS,$3,FS,$4,FS,a[$1]}' B.bed A.bed  > result1
ADD COMMENTlink written 4 weeks ago by mittu1602130
1
gravatar for cpad0112
4 weeks ago by
cpad01124.1k
cpad01124.1k wrote:

output:

$ grep -w -f ids.txt test.txt 
chr1    startpos    endpos  chr1-1
chr1    startpos    endpos  chr1-10
chr1    startpos    endpos  chr1-102
chr1    startpos    endpos  chr1-106
chr1    startpos    endpos  chr1-11
chr1    startpos    endpos  chr1-2
chr1    startpos    endpos  chr1-3

$ join  -1 1 -2 4 ids.txt test.txt 
chr1-1 chr1 startpos endpos
chr1-10 chr1 startpos endpos
chr1-102 chr1 startpos endpos
chr1-106 chr1 startpos endpos
chr1-11 chr1 startpos endpos
chr1-2 chr1 startpos endpos
chr1-3 chr1 startpos endpos

input:

$ cat ids.txt 
chr1-1
chr1-10
chr1-102
chr1-106
chr1-11
chr1-2
chr1-3

$ cat test.txt 
chr1    startpos    endpos  chr1-1
chr1    startpos    endpos  chr1-10
chr1    startpos    endpos  chr1-102
chr1    startpos    endpos  chr1-106
chr1    startpos    endpos  chr1-11
chr1    startpos    endpos  chr1-2
chr1    startpos    endpos  chr1-3
chr2    startpos    endpos  chr2-234
chr12   startpos    endpos  chr12-23546
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by cpad01124.1k
1

You can modify the join output with

join -1 1 -2 4 -o 2.1,2.2,2.3,0 ids.txt test.txt | tr ' ' '\t'

The tr command replaces the standard white-space with a tab.

ADD REPLYlink written 4 weeks ago by michael.ante2.0k
1

Join supports tsv output natively. output from $ join -t $'\t' -1 1 -2 4 -o 2.1,2.2,2.3,0 ids.txt test.txt is = join -1 1 -2 4 -o 2.1,2.2,2.3,0 ids.txt test.txt | tr ' ' '\t'

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by cpad01124.1k
0
gravatar for Inquisitive8995
4 weeks ago by
Inquisitive899510 wrote:

Hi, Are the number of rows equal in both the files ? Try grep -Fwf A.bed B.bed > Output.txt

ADD COMMENTlink written 4 weeks ago by Inquisitive899510

not equal, A.bed has 645 rows, B.bed has 33024 rows. But all A.bed are from one column of B.bed.

I think maybe "-"dash break the -w limited string?

Tried your code, still cannot find the rest same gene with grep -Fwf

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Ming Lu0

In your command "comm -1 -3 A.bed C.bed"

-1 will suppress column 1 (lines unique to FILE 1) -3 will suppress column 3 (lines that appear in both files)

When using -3 , you are actually suppressing the lines that match in A.bed and B.bed.

Please try using "comm -1 -2 A.bed B.bed"

ADD REPLYlink written 4 weeks ago by Inquisitive899510

just writing mistake not the focus.

ADD REPLYlink written 4 weeks ago by Ming Lu0
0
gravatar for EagleEye
4 weeks ago by
EagleEye4.9k
Sweden
EagleEye4.9k wrote:
grep -w -Ff File2.txt File1.txt > commonFile1File2.txt
ADD COMMENTlink written 4 weeks ago by EagleEye4.9k
0
gravatar for Ming Lu
4 weeks ago by
Ming Lu0
Australia
Ming Lu0 wrote:

Firstly, I change all "-" to "_", and only use the column I use for grep, but make no difference.

All 654 rows of moVDR1220 should be in 36551 rows of trytry.txt

as moVDR1220.txt is a result of

#first transform enhancer.txt to enhancer.bed (move name column such as chr1-10 from 1 to 4 )
#then
bedtools intersect -a enhancer.bed -b BBB.bed -wa | cut -f4 > moVDR1220.txt

and trytry.txt is the result of ( the wc -l of enhancer.txt, enhancer.bed, trytry.txt, trytry.cdt all 36551)

annotatePeaks.pl enhancer.txt hg19 -size 2000 -hist 10 -ghist -d 24hvitd/ 24heth/ > trytry.txt.
more trytry.txt|cut -f1> trytry.txt

so the grep or join or comm result should all be 654.

my data is:

homer $ more moVDR1220.txt|head
chr1_1
chr1_10
chr1_102
chr1_106
chr1_11
chr1_1140
chr1_115
chr1_12
chr1_123
chr1_14
homer$ more trytry.txt|head
Gene
chr1_1
chr1_10
chr1_100
chr1_1000
chr1_10000
chr1_10025
chr1_10028
chr1_10031
chr1_10037
homer$ grep -w -f moVDR1220.txt trytry.txt | wc -l
 180 
homer$ grep -w -f moVDR1220.txt trytry.txt | head
chr1_1
chr1_2
chr1_3
chr1_4 
chr1_5
chr1_6
chr1_75
chr1_76
chr1_8
chr1_9
homer$ join -1 1 -2 1 moVDR1220.txt trytry.txt | wc -l
 389
homer$ join -1 1 -2 1 moVDR1220.txt trytry.txt | head
chr1_1
chr1_10
chr1_102
chr1_106
chr1_11
chr1_1140
chr1_115
chr1_12
chr1_123
chr1_14
homer$ comm -1 -2 moVDR1220.txt trytry.txt| wc -l
 389

I know the problem now "-" didn't impact, a mistake in bedtools step.

But I still don;t know why grep cannot do this kind of thing.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Ming Lu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 991 users visited in the last hour