Bedtools getfasta outputting a blank file
1
0
Entering edit mode
9.6 years ago
mshumph2 • 0

Hi,

I'm using bedtools getfasta to get a bunch of sequences from chromosome 1. I have "chr1.fa" (from UCSC Genome Browser) as the input fasta file, and I have a BED file with chromosome location, start, stop, and name columns. My input looks like this: bedtools getfasta -fi chr1.fa -bed bedfile.bed -fo testing.fa.out -name because I'd like to organize the sequences by name.

The problem is this: when I run this command I don't get any errors, it just outputs a blank file with whatever name I gave it (in this case testing.fa.out). The problem may come down to this: I was given an excel spreadsheet with coordinates on it and I simply saved the file as tab-delimited text format. I copied out the three relevant columns- chrom, start, and stop- and put them into a new spreadsheet before saving it as a tab-delimited text file. Then I gave the columns each a name. It looks like in the tab-delimited text file the "tabbing" is different for the first 100 or so lines; the distance between columns is shorter. Then, later, the spaces between the columns become wider. If this is the problem, how can I fix this? I'm on a Mac, if that's relevant information.

Thanks

sequence • 4.8k views
ADD COMMENT
0
Entering edit mode

Can you do a head on your bedfile and show us how it looks?

ADD REPLY
0
Entering edit mode

I'm not sure what a head is, but this is the format it's in. As you can see the format changes a few coordinates down. Also, copying and pasting changes the spacing between the columns.

chr1    9885764    9885814    chr1:9885764-9885814
chr1    9903769    9903819    chr1:9903769-9903819
chr1    9903769    9903819    chr1:9903769-9903819
chr1    10040879    10040929    chr1:10040879-10040929
chr1    10040879    10040929    chr1:10040879-10040929
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10105721    10105771    chr1:10105721-10105771
chr1    10511188    10511238    chr1:10511188-10511238
chr1    10511188    10511238    chr1:10511188-10511238
chr1    10511188    10511238    chr1:10511188-10511238
ADD REPLY
0
Entering edit mode

One issue I can see immediately is that your "start" column is off by one. BED coordinates are [0, 1) meaning 0-based start, one-based end coordinates. Ex first 100 bases of chr1 would be: chr1 0 100

ADD REPLY
0
Entering edit mode

Don't worry about the TAB character representation. The display of TAB characters will not seem consistent, but the important thing is that there is not a mixture of TAB and SPACE.

ADD REPLY
0
Entering edit mode

Also, head is a program on Unix systems that displays the first n lines of a file.

ADD REPLY
0
Entering edit mode

Hi , I can see that this post is older than wood, but I have the exact same issue, even down to mac making excel. Did you figure out some solution for this=?

ADD REPLY
1
Entering edit mode

Dear ann-katrin, as there's no solution and OP hasn't been active ever since, you'll be better off creating a new question with your detailed problem. In case this thread has the exact same problem, you can reference it.

To provide a minimum help, Mac, Windows and Unix use different line endings to encode a line break. Mac uses carriage return characters (\r) while Unix uses newline characters (\n). Excel usually saves text files using the operating system's settings. Many Unix tools expect Unix line breaks, and if they get something different, they fail with what seems to be bizarre warnings/results. To the software it sometimes looks like the entire input is a single line.

ADD REPLY
2
Entering edit mode
9.6 years ago

I'm not quite sure what your issue might be, but you can also do this using the --bed option of the faidx utility included in the pyfaidx module.

Install:

pip install pyfaidx OR easy_install pyfaidx

Then:

faidx chr1.fa -b bedfile.bed > regions.fa

As the author of pyfaidx I can tell you that I've tried to add helpful error messages that might give you a hint about any issues with your files.

ADD COMMENT
0
Entering edit mode

Thanks for the help!

I downloaded pyfaidx and now I'm getting text, but only one line. So it looks something like:

>chr1
AATCCCCAAAAGTTT

And then it stops after that first line. Any suggestions?

ADD REPLY

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6