Question: Bedtools getfasta outputting a blank file
0
gravatar for mshumph2
5.0 years ago by
mshumph20
United States
mshumph20 wrote:

Hi,

I'm using bedtools getfasta to get a bunch of sequences from chromosome 1. I have "chr1.fa" (from UCSC Genome Browser) as the input fasta file, and I have a BED file with chromosome location, start, stop, and name columns. My input looks like this: bedtools getfasta -fi chr1.fa -bed bedfile.bed -fo testing.fa.out -name because I'd like to organize the sequences by name.

The problem is this: when I run this command I don't get any errors, it just outputs a blank file with whatever name I gave it (in this case testing.fa.out). The problem may come down to this: I was given an excel spreadsheet with coordinates on it and I simply saved the file as tab-delimited text format. I copied out the three relevant columns- chrom, start, and stop- and put them into a new spreadsheet before saving it as a tab-delimited text file. Then I gave the columns each a name. It looks like in the tab-delimited text file the "tabbing" is different for the first 100 or so lines; the distance between columns is shorter. Then, later, the spaces between the columns become wider. If this is the problem, how can I fix this? I'm on a Mac, if that's relevant information.

Thanks

sequence • 2.9k views
ADD COMMENTlink modified 7 days ago by ann-katrin.llarena0 • written 5.0 years ago by mshumph20

can you do a head on your bedfile and show us how it looks?

ADD REPLYlink written 5.0 years ago by komal.rathi3.4k

I'm not sure what a head is, but this is the format it's in. As you can see the format changes a few coordinates down. Also, copying and pasting changes the spacing between the columns.

 

chr1    9885764    9885814    chr1:9885764-9885814
chr1    9903769    9903819    chr1:9903769-9903819
chr1    9903769    9903819    chr1:9903769-9903819
chr1    10040879    10040929    chr1:10040879-10040929
chr1    10040879    10040929    chr1:10040879-10040929
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10511188    10511238    chr1:10511188-10511238
chr1    10511188    10511238    chr1:10511188-10511238
chr1    10511188    10511238    chr1:10511188-10511238

ADD REPLYlink written 5.0 years ago by mshumph20

One issue I can see immediately is that your "start" column is off by one. BED coordinates are [0, 1) meaning 0-based start, one-based end coordinates. Ex first 100 bases of chr1 would be: chr1 0 100

 

ADD REPLYlink written 5.0 years ago by Matt Shirley9.1k

Don't worry about the TAB character representation. The display of TAB characters will not seem consistent, but the important thing is that there is not a mixture of TAB and SPACE.

ADD REPLYlink written 5.0 years ago by Matt Shirley9.1k

Also, "head" is a program on Unix systems that displays the first n lines of a file.

ADD REPLYlink written 5.0 years ago by Matt Shirley9.1k

Hi , I can see that this post is older than wood, but I have the exact same issue, even down to mac making excel. Did you figure out some solution for this=?

ADD REPLYlink written 7 days ago by ann-katrin.llarena0

Dear ann-katrin, as there's no solution and OP hasn't been active ever since, you'll be better off creating a new question with your detailed problem. In case this thread has the exact same problem, you can reference it.

To provide a minimum help, Mac, Windows and Unix use different line endings to encode a line break. Mac uses carriage return characters (\r) while Unix uses newline characters (\n). Excel usually saves text files using the operating system's settings. Many Unix tools expect Unix line breaks, and if they get something different, they fail with what seems to be bizarre warnings/results. To the software it sometimes looks like the entire input is a single line.

ADD REPLYlink modified 7 days ago • written 7 days ago by Carambakaracho1.6k
2
gravatar for Matt Shirley
5.0 years ago by
Matt Shirley9.1k
Cambridge, MA
Matt Shirley9.1k wrote:

I'm not quite sure what your issue might be, but you can also do this using the "--bed" option of the "faidx" utility included in the pyfaidx module. 

Install:

pip install pyfaidx OR easy_install pyfaidx

Then:

faidx chr1.fa -b bedfile.bed > regions.fa

As the author of pyfaidx I can tell you that I've tried to add helpful error messages that might give you a hint about any issues with your files.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Matt Shirley9.1k

Thanks for the help!

I downloaded pyfaidx and now I'm getting text, but only one line. So it looks something like:

>chr1

AATCCCCAAAAGTTT

And then it stops after that first line. Any suggestions?

ADD REPLYlink written 5.0 years ago by mshumph20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1973 users visited in the last hour