Question: Extract rows from BED file on the base of text content in one column
1
gravatar for mujupas
2.6 years ago by
mujupas40
Japan
mujupas40 wrote:

Hi,

I am a newbie with scripting so I can't find an easy solution to this question by myself and I'd like to ask for some help.

I have a long list of BED files, and for each file I want to scan them row by row and if the content of a given column contains some text I am looking for (say, for example, gene name "A") I want that full row to be copied into a new, separate bed file.

I'm looking fwd to hearing your suggestions,

thanks in advance!

 

 

 

unix extract bed bedtools • 1.7k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by mujupas40

Thank you so much guys!

g.

ADD REPLYlink written 2.6 years ago by mujupas40

Great, thank again to you all guys!

g.

ADD REPLYlink written 2.6 years ago by mujupas40
2
gravatar for Chirag Nepal
2.6 years ago by
Chirag Nepal1.7k
Copenhagen
Chirag Nepal1.7k wrote:

If Input file contains gene name in fourth column, then it will print only that line

cat input.bed | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName") { print $0 }}'  > outputFile

If you have many bed files and want to loop

for name in $(ls *bed)

do

cat $name | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName") { print $0 }}'  >> outputFile

done

 

 

ADD COMMENTlink written 2.6 years ago by Chirag Nepal1.7k
1

or just

 

awk '($4=="geneName")' *.bed > output
ADD REPLYlink written 2.6 years ago by Pierre Lindenbaum91k

Thanks a lot, what if I want to specify 2 gene names in the "geneName" field?

g.

ADD REPLYlink written 2.6 years ago by mujupas40

cat $name | awk 'BEGIN {OFS="\t"} { if ($4 == "geneName1" || $4 == "geneName2" ) { print $0 }}'  >> outputFile

If you have hundred of genes, then u might want to loop it.

ADD REPLYlink written 2.6 years ago by Chirag Nepal1.7k
0
gravatar for Cytosine
2.6 years ago by
Cytosine400
Ljubljana, Slovenia
Cytosine400 wrote:

Think grep is the simplest answer here:

grep $geneName $inputFile >> $outputFile

 

Or for more than 1 gene per row:

grep -E '$geneName1|geneName2' $inputFile >> $outputFile

ADD COMMENTlink written 2.6 years ago by Cytosine400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1000 users visited in the last hour