Question: Bedtools getfasta outputting a blank file
0
gravatar for mshumph2
4.6 years ago by
mshumph20
United States
mshumph20 wrote:

Hi,

I'm using bedtools getfasta to get a bunch of sequences from chromosome 1. I have "chr1.fa" (from UCSC Genome Browser) as the input fasta file, and I have a BED file with chromosome location, start, stop, and name columns. My input looks like this: bedtools getfasta -fi chr1.fa -bed bedfile.bed -fo testing.fa.out -name because I'd like to organize the sequences by name.

The problem is this: when I run this command I don't get any errors, it just outputs a blank file with whatever name I gave it (in this case testing.fa.out). The problem may come down to this: I was given an excel spreadsheet with coordinates on it and I simply saved the file as tab-delimited text format. I copied out the three relevant columns- chrom, start, and stop- and put them into a new spreadsheet before saving it as a tab-delimited text file. Then I gave the columns each a name. It looks like in the tab-delimited text file the "tabbing" is different for the first 100 or so lines; the distance between columns is shorter. Then, later, the spaces between the columns become wider. If this is the problem, how can I fix this? I'm on a Mac, if that's relevant information.

Thanks

sequence • 2.7k views
ADD COMMENTlink modified 4.6 years ago by Matt Shirley8.9k • written 4.6 years ago by mshumph20

can you do a head on your bedfile and show us how it looks?

ADD REPLYlink written 4.6 years ago by komal.rathi3.4k

I'm not sure what a head is, but this is the format it's in. As you can see the format changes a few coordinates down. Also, copying and pasting changes the spacing between the columns.

 

chr1    9885764    9885814    chr1:9885764-9885814
chr1    9903769    9903819    chr1:9903769-9903819
chr1    9903769    9903819    chr1:9903769-9903819
chr1    10040879    10040929    chr1:10040879-10040929
chr1    10040879    10040929    chr1:10040879-10040929
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10511188    10511238    chr1:10511188-10511238
chr1    10511188    10511238    chr1:10511188-10511238
chr1    10511188    10511238    chr1:10511188-10511238

ADD REPLYlink written 4.6 years ago by mshumph20

One issue I can see immediately is that your "start" column is off by one. BED coordinates are [0, 1) meaning 0-based start, one-based end coordinates. Ex first 100 bases of chr1 would be: chr1 0 100

 

ADD REPLYlink written 4.6 years ago by Matt Shirley8.9k

Don't worry about the TAB character representation. The display of TAB characters will not seem consistent, but the important thing is that there is not a mixture of TAB and SPACE.

ADD REPLYlink written 4.6 years ago by Matt Shirley8.9k

Also, "head" is a program on Unix systems that displays the first n lines of a file.

ADD REPLYlink written 4.6 years ago by Matt Shirley8.9k
2
gravatar for Matt Shirley
4.6 years ago by
Matt Shirley8.9k
Cambridge, MA
Matt Shirley8.9k wrote:

I'm not quite sure what your issue might be, but you can also do this using the "--bed" option of the "faidx" utility included in the pyfaidx module. 

Install:

pip install pyfaidx OR easy_install pyfaidx

Then:

faidx chr1.fa -b bedfile.bed > regions.fa

As the author of pyfaidx I can tell you that I've tried to add helpful error messages that might give you a hint about any issues with your files.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Matt Shirley8.9k

Thanks for the help!

I downloaded pyfaidx and now I'm getting text, but only one line. So it looks something like:

>chr1

AATCCCCAAAAGTTT

And then it stops after that first line. Any suggestions?

ADD REPLYlink written 4.6 years ago by mshumph20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1786 users visited in the last hour