Question

Extract rows from BED file on the base of text content in one column

1

Entering edit mode

10.0 years ago

mujupas ▴ 70

Hi,

I am a newbie with scripting so I can't find an easy solution to this question by myself and I'd like to ask for some help.

I have a long list of BED files, and for each file I want to scan them row by row and if the content of a given column contains some text I am looking for (say, for example, gene name "A") I want that full row to be copied into a new, separate bed file.

I'm looking fwd to hearing your suggestions,

thanks in advance!

extract bedtools bed unix • 8.1k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 10.0 years ago by mujupas ▴ 70

0

Entering edit mode

Thank you so much guys!

g.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 10.0 years ago by mujupas ▴ 70

0

Entering edit mode

Great, thank again to you all guys!

g.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 10.0 years ago by mujupas ▴ 70

0

Entering edit mode

10.0 years ago

Cytosine ▴ 460

Think grep is the simplest answer here:

grep $geneName $inputFile >> $outputFile

Or for more than 1 gene per row:

grep -E '$geneName1|geneName2' $inputFile >> $outputFile

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 10.0 years ago by Cytosine ▴ 460

Ram · Accepted Answer · 2014-08-04

3

Entering edit mode

10.0 years ago

Chirag Nepal ★ 2.4k

If Input file contains gene name in fourth column, then it will print only that line

cat input.bed | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName") { print $0 }}' > outputFile

If you have many bed files and want to loop

for name in $(ls *bed)
do
  cat $name | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName") { print $0 }}' >> outputFile
done

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 10.0 years ago by Chirag Nepal ★ 2.4k

1

Entering edit mode

or just

awk '($4=="geneName")' *.bed > output

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 10.0 years ago by Pierre Lindenbaum 163k

0

Entering edit mode

Thanks a lot, what if I want to specify 2 gene names in the "geneName" field?

g.

ADD REPLY • link 10.0 years ago by mujupas ▴ 70

0

Entering edit mode

cat $name | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName1" || $4 == "geneName2" ) { print $0 }}' >> outputFile

If you have hundred of genes, then u might want to loop it.

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 10.0 years ago by Chirag Nepal ★ 2.4k