Hi,
I'm using bedtools getfasta to get a bunch of sequences from chromosome 1. I have "chr1.fa" (from UCSC Genome Browser) as the input fasta file, and I have a BED file with chromosome location, start, stop, and name columns. My input looks like this: bedtools getfasta -fi chr1.fa -bed bedfile.bed -fo testing.fa.out -name
because I'd like to organize the sequences by name.
The problem is this: when I run this command I don't get any errors, it just outputs a blank file with whatever name I gave it (in this case testing.fa.out). The problem may come down to this: I was given an excel spreadsheet with coordinates on it and I simply saved the file as tab-delimited text format. I copied out the three relevant columns- chrom, start, and stop- and put them into a new spreadsheet before saving it as a tab-delimited text file. Then I gave the columns each a name. It looks like in the tab-delimited text file the "tabbing" is different for the first 100 or so lines; the distance between columns is shorter. Then, later, the spaces between the columns become wider. If this is the problem, how can I fix this? I'm on a Mac, if that's relevant information.
Thanks
Can you do a head on your bedfile and show us how it looks?
I'm not sure what a head is, but this is the format it's in. As you can see the format changes a few coordinates down. Also, copying and pasting changes the spacing between the columns.
One issue I can see immediately is that your "start" column is off by one. BED coordinates are [0, 1) meaning 0-based start, one-based end coordinates. Ex first 100 bases of chr1 would be:
chr1 0 100
Don't worry about the TAB character representation. The display of TAB characters will not seem consistent, but the important thing is that there is not a mixture of TAB and SPACE.
Also,
head
is a program on Unix systems that displays the first n lines of a file.Hi , I can see that this post is older than wood, but I have the exact same issue, even down to mac making excel. Did you figure out some solution for this=?
Dear ann-katrin, as there's no solution and OP hasn't been active ever since, you'll be better off creating a new question with your detailed problem. In case this thread has the exact same problem, you can reference it.
To provide a minimum help, Mac, Windows and Unix use different line endings to encode a line break. Mac uses carriage return characters (
\r
) while Unix uses newline characters (\n
). Excel usually saves text files using the operating system's settings. Many Unix tools expect Unix line breaks, and if they get something different, they fail with what seems to be bizarre warnings/results. To the software it sometimes looks like the entire input is a single line.