Question: bedtool getfasta only reads the first line of my bedfile
0
gravatar for ann-katrin.llarena
10 days ago by
ann-katrin.llarena0 wrote:

Hi all, hope you can help.

I have a multifasta file containing genomes of 730something procaryotic genomes (5-6Mb); all contigs/chromosomes are named in headers (as so)

>CP009335.1 Bacillus thuringiensis strain HD1011, complete genome
TCCTGATGGAACTTTAATTGATGAAAAGAGTCGTGTAAACTTTTTCCATCTTTCAACCCATCAATCATGC
GCTGCAATTGTACTTTCTTTTCTAAAGGTAATTGAAACCGTAAAAATTCTAATGCCTGCAAAAGGGAGTA
TCCTTTTTCTAATAGTTCTCCTAATCGTTTCAGTAATATGACTTGATCACTTAAACTCCATATTTCCTTA
AACATAAACATCTTCTTCTAAAAACCCTAAAGCGTATCCTTTTCGTATCGAAGATTGTAATGTTTCGTGC
TTGTATGTGACACATTCCCCGTTTGCTTCTTTAATCGCTTGTTTTAACTCATATCCATATAACAACTCAT
AAATACTCGCTTGCCTTACTTGCCTCATTGATTT

I have a bedfile created orginally in excel, saved as tabdelimited, converted using dos2unix and contains one line for each genome, like so:

CP009335.1  1984592 1992438 CP009335.1_genome.tsv   B_thuringesis
CP009720.1  3944559 3952406 CP009720.1_genome.tsv   B_thuringesis
ABDL02000007.1  228801  234535  GCA_000171035.2_ASM17103v2_genomic.tsv  B_cereus
CP026376.1  1520664 1528500 GCA_002952815.1_ASM295281v1_genomic.tsv B_cereus
NZ_CM000714.1   757305  765101  GCF_000003645.1_ASM364v1_genomic.tsv    B_cereus

I ran the following codes on the bed file to make sure it was tab delimited:

awk '{ for(i=1;i<=NF;i++){if(i==NF){printf("%s\n",$NF);}else {printf("%s\t",$i)}}}' "mybedfile"

And when I manually check it, it looks ok.

When I run bedtools getfasta, I only get results from my top row. No error messages or nothing, just one result. I tried to copy paste some rows from the long bed file to a new file, and then it worked after manually editing in tabs and such. But I am hoping to avoid that. So, can anybody help med make bedtools read the whole bed file, and if there is something wrong with the bedfile (which seems to be the issue) how can I make it good and tab delimited?

THANK YOU

sequence gene • 95 views
ADD COMMENTlink modified 10 days ago by h.mon27k • written 10 days ago by ann-katrin.llarena0

Hello ann-katrin.llarena and welcome to biostars.

I tried to copy paste some rows from the long bed file to a new file, and then it worked after manually editing in tabs and such.

That's a good proof that you original bed file isn't complete tab separated. You can convert als spaces to tabs using sed

sed 's/ \+/\t/g' input.bed > fixed.bed

fin swimmer

ADD REPLYlink written 10 days ago by finswimmer12k

Thank you for helping, but still it just gives out the hit for the first row in the bed file and ignores the remainder of the file. Eh...any other takes on the issue?

ADD REPLYlink written 10 days ago by ann-katrin.llarena0

Could you please upload an extract of your bed file that doesn't work to somewhere, so we can take a closer look on that?

Also please show the exact bedtoools command you were using.

Thanks.

ADD REPLYlink written 9 days ago by finswimmer12k

Did you prepare the file in Windows? If so, the line terminators might be wrong, and you can use dos2unix to fix your file.

ADD REPLYlink written 10 days ago by WouterDeCoster41k

Thank you, I already did. It belongs to the story that I made in on mac - excel and in the file it says "converted from mac format"

ADD REPLYlink written 10 days ago by ann-katrin.llarena0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 884 users visited in the last hour