Is there a way to automate or pipline bedtofasta or similar script?
2
0
Entering edit mode
7.4 years ago

Hi,

This is my first post so please bear with me. I have about 2000 bed files that I would like to extract FASTA sequences from. I have been able to extract a single set of bed sequences using the following command:

.narrowPeak is a BED file.

bedtools getfasta -fi TAIR10_Chr.fasta -bed /copy_peaks/ABI5_col_v3h.narrowPeak > ABI5_col_v3h.fasta

within my copy_peaks folder there are 2000 other files I would like to perform this operation on. Any ideas?

The results will then be used in fimo, where again I will need to perform this operation on the extracted fasta files.

Thanks,

genome sequencing Assembly • 2.6k views
ADD COMMENT
3
Entering edit mode
7.4 years ago

I wrote a Perl script a while back that uses samtools to automate this conversion, using indexed FASTA files:

If you use bash, you could write a for loop to automate conversion of a batch of files.

ADD COMMENT
3
Entering edit mode
7.4 years ago

I think a bash loop could help you, or for more advanced options you could use gnu-parallel.

If all your files have the same .narrowPeak extension you could use the following example:

for f in /copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`
bedtools getfasta -fi TAIR10_Chr.fasta -bed $f > $r
done
ADD COMMENT
0
Entering edit mode

I'll give this a shot, all of files have the same extension. Does this also create a new file? and I was hoping for the new file to keep the name of the narrowpeak file but with .fasta as the suffix. Thanks!

ADD REPLY
0
Entering edit mode

The first thing the loop does is create a $r variable, which is based on the file name of the input as you can see. That $r variable is then used as output name and indeed I switch the narrowPeak to fasta and write to the current directory. Changing this should be pretty straightforward, right?

You can test the output names first using the following loop:

for f in /copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`
echo $r
done
ADD REPLY
0
Entering edit mode

I ran this :

for f in /copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`
echo $r
done

The result was: *.fasta I then attempted to run the loop as you had written it, the results were:

for f in /copy_peaks/*.narrowPeak; do r=`echo $f | cut -f 3 -d '/' | sed s/narrowPeak/fasta/`; bedtools getfasta -fi /mnt/c/Users/Desktop/FIMO_Project/TAIR10_seq_chr/TAIR10_Chr.fasta -bed $f > $r; done

bash: *.fasta: No such file or directory

Please pardon all my questions, I've been stuck on this for a while now.

ADD REPLY
0
Entering edit mode

I assume based on this that there is no path /copy_peaks/. Are you sure of that path? Is it perhaps just copy_peaks in the current directory, and not on the root directory?

ADD REPLY
0
Entering edit mode

Yes, copy_peaks is a folder in the current working directory, now I'm getting $r is an ambiguous redirect? Thanks - Alex

ADD REPLY
0
Entering edit mode

Does this look okay? Note that I changed the first line to reflect the directory in which the bed files can be found.

for f in copy_peaks/*.narrowPeak
do
r=`echo $f | cut -f 2 -d '/' | sed s/narrowPeak/fasta/`
echo $r
done

Your example bedtools getfasta -fi TAIR10_Chr.fasta -bed /copy_peaks/ABI5_col_v3h.narrowPeak > ABI5_col_v3h.fasta was then incorrect because the path to the narrowPeak file doesn't exist as such. Should have been bedtools getfasta -fi TAIR10_Chr.fasta -bed copy_peaks/ABI5_col_v3h.narrowPeak > ABI5_col_v3h.fasta

ADD REPLY
0
Entering edit mode

It's worked! Thank you for your help

ADD REPLY

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6