Extract rows from BED file on the base of text content in one column
2
1
Entering edit mode
10.1 years ago
mujupas ▴ 80

Hi,

I am a newbie with scripting so I can't find an easy solution to this question by myself and I'd like to ask for some help.

I have a long list of BED files, and for each file I want to scan them row by row and if the content of a given column contains some text I am looking for (say, for example, gene name "A") I want that full row to be copied into a new, separate bed file.

I'm looking fwd to hearing your suggestions,

thanks in advance!

extract bedtools bed unix • 8.2k views
ADD COMMENT
0
Entering edit mode

Thank you so much guys!

g.

ADD REPLY
0
Entering edit mode

Great, thank again to you all guys!

g.

ADD REPLY
3
Entering edit mode
10.1 years ago
Chirag Nepal ★ 2.4k

If Input file contains gene name in fourth column, then it will print only that line

cat input.bed | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName") { print $0 }}' > outputFile

If you have many bed files and want to loop

for name in $(ls *bed)
do
  cat $name | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName") { print $0 }}' >> outputFile
done
ADD COMMENT
1
Entering edit mode

or just

awk '($4=="geneName")' *.bed > output
ADD REPLY
0
Entering edit mode

Thanks a lot, what if I want to specify 2 gene names in the "geneName" field?

g.

ADD REPLY
0
Entering edit mode
cat $name | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName1" || $4 == "geneName2" ) { print $0 }}' >> outputFile

If you have hundred of genes, then u might want to loop it.

ADD REPLY
0
Entering edit mode
10.1 years ago
Cytosine ▴ 460

Think grep is the simplest answer here:

grep $geneName $inputFile >> $outputFile

Or for more than 1 gene per row:

grep -E '$geneName1|geneName2' $inputFile >> $outputFile
ADD COMMENT

Login before adding your answer.

Traffic: 2380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6