Question

Is there a way to automate or pipline bedtofasta or similar script?

0

Entering edit mode

7.4 years ago

alexjfortuna • 0

Hi,

This is my first post so please bear with me. I have about 2000 bed files that I would like to extract FASTA sequences from. I have been able to extract a single set of bed sequences using the following command:

.narrowPeak is a BED file.

bedtools getfasta -fi TAIR10_Chr.fasta -bed /copy_peaks/ABI5_col_v3h.narrowPeak > ABI5_col_v3h.fasta

within my copy_peaks folder there are 2000 other files I would like to perform this operation on. Any ideas?

The results will then be used in fimo, where again I will need to perform this operation on the extracted fasta files.

Thanks,

genome sequencing Assembly • 2.6k views

ADD COMMENT • link updated 7.4 years ago by WouterDeCoster 47k • written 7.4 years ago by alexjfortuna • 0

score 3 · Accepted Answer · 2016-12-19

3

Entering edit mode

7.4 years ago

Alex Reynolds 35k

I wrote a Perl script a while back that uses samtools to automate this conversion, using indexed FASTA files:

If you use bash, you could write a for loop to automate conversion of a batch of files.

ADD COMMENT • link 7.4 years ago by Alex Reynolds 35k

WouterDeCoster · Accepted Answer · 2016-12-19

3

Entering edit mode

7.4 years ago

WouterDeCoster 47k

I think a bash loop could help you, or for more advanced options you could use gnu-parallel.

If all your files have the same .narrowPeak extension you could use the following example:

for f in /copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`
bedtools getfasta -fi TAIR10_Chr.fasta -bed $f > $r
done

ADD COMMENT • link 7.4 years ago by WouterDeCoster 47k

0

Entering edit mode

I'll give this a shot, all of files have the same extension. Does this also create a new file? and I was hoping for the new file to keep the name of the narrowpeak file but with .fasta as the suffix. Thanks!

ADD REPLY • link 7.4 years ago by alexjfortuna • 0

0

Entering edit mode

The first thing the loop does is create a $r variable, which is based on the file name of the input as you can see. That $r variable is then used as output name and indeed I switch the narrowPeak to fasta and write to the current directory. Changing this should be pretty straightforward, right?

You can test the output names first using the following loop:

for f in /copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`
echo $r
done

ADD REPLY • link 7.4 years ago by WouterDeCoster 47k

0

Entering edit mode

I ran this :

for f in /copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`
echo $r
done

The result was: *.fasta I then attempted to run the loop as you had written it, the results were:

for f in /copy_peaks/*.narrowPeak; do r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`; bedtools getfasta -fi /mnt/c/Users/Desktop/FIMO_Project/TAIR10_seq_chr/TAIR10_Chr.fasta -bed $f > $r; done

bash: *.fasta: No such file or directory

Please pardon all my questions, I've been stuck on this for a while now.

ADD REPLY • link updated 7.4 years ago by WouterDeCoster 47k • written 7.4 years ago by alexjfortuna • 0

0

Entering edit mode

I assume based on this that there is no path /copy_peaks/. Are you sure of that path? Is it perhaps just copy_peaks in the current directory, and not on the root directory?

ADD REPLY • link 7.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes, copy_peaks is a folder in the current working directory, now I'm getting $r is an ambiguous redirect? Thanks - Alex

ADD REPLY • link 7.4 years ago by alexjfortuna • 0

0

Entering edit mode

Does this look okay? Note that I changed the first line to reflect the directory in which the bed files can be found.

for f in copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 2 -d '/' | sed s/narrowPeak/fasta/`
echo $r
done

Your example bedtools getfasta -fi TAIR10_Chr.fasta -bed /copy_peaks/ABI5_col_v3h.narrowPeak > ABI5_col_v3h.fasta was then incorrect because the path to the narrowPeak file doesn't exist as such. Should have been bedtools getfasta -fi TAIR10_Chr.fasta -bed copy_peaks/ABI5_col_v3h.narrowPeak > ABI5_col_v3h.fasta