Question: How Do I Convert Fa Files To Bed Format?
1
gravatar for Keziah
8.6 years ago by
Keziah70
Keziah70 wrote:

how do I convert fa files to bed format?

fasta bed • 21k views
ADD COMMENTlink modified 8 months ago by john.major0 • written 8.6 years ago by Keziah70
2

@michael my guess would be that it is due to a confusion of formats & their purposes. happens to the best!

ADD REPLYlink written 8.6 years ago by Deniz140

Noahaus, your script gives different results than faidx does..... I believe faidx as it is a community tool thats been around a while, you may want to double check your script (or maybe file a bug with faidx),

ADD REPLYlink written 8 months ago by john.major0
8
gravatar for Pierre Lindenbaum
8.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

You can't.

A Fa/Fasta file describes a sequence of DNA/Protein:

>Name
ATAGCTACGATTACGACGTACG
ATCGATCGATGCATCAGCTACT
AACTAGTCGATGATGCATACG...

A bed file describes some features mapped on a genome/sequence:

chr1  786 9879 gene1
chr2  486 979  gene2

The only thing you can do is saying that the BED contains only one feature: your sequence:

Name  0  1098 Name
ADD COMMENTlink written 8.6 years ago by Pierre Lindenbaum129k
3
gravatar for Matt Shirley
8.6 years ago by
Matt Shirley9.3k
Cambridge, MA
Matt Shirley9.3k wrote:

This is a good, though slightly misguided question. If you want to make a BED file from a FASTA sequence, you might do something like this:

  1. Find your FASTA sequence. We'll use human hemoglobin beta as an example.
  2. Use BLAST to align your sequence to the human reference genome.
  3. In the alignment example above, you would pick the genomic alignment, not transcript, and choose "subject" start and end positions. Note that this example has 3 exons, so your BED file start would be 5186957 and end would be 5188159.
  4. Create a BED file using the chromosome (11 in above example), chromstart (5186957), chromend (5188159), and a name for your gene (HBB). You can then upload this as a custom track in the UCSC genome browser.
  5. You will probably want to add introns to your gene structure, so at this point you can the GenePred format instead of BED. This allows you to specify the strand, as well as introns, exons, and transcription factor binding sites.
ADD COMMENTlink written 8.6 years ago by Matt Shirley9.3k
2
gravatar for Madelaine Gogol
8.6 years ago by
Madelaine Gogol5.1k
Kansas City
Madelaine Gogol5.1k wrote:

Uh, align it to a genome?

ADD COMMENTlink written 8.6 years ago by Madelaine Gogol5.1k
1

Use bowtie or bwa, generate bam format, then use bedtools bamToBed to generate a bed file.

Or use blat and this perl script to convert to bed... https://github.com/mmarchin/utilities/blob/master/parseBlat.pl

ADD REPLYlink written 8.6 years ago by Madelaine Gogol5.1k

And some aligners such as BLAT will output alignments in BED format...

ADD REPLYlink written 8.6 years ago by Malachi Griffith18k

@malachig - I don't think BLAT outputs BED. The -out=type is one of: psl - Default. Tab separated format, no sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabul

ADD REPLYlink written 8.6 years ago by Casey Bergman18k
1
gravatar for cpcantalapiedra
4.2 years ago by
Spain
cpcantalapiedra140 wrote:
cat $fastafile | awk '$0 ~ "^>" {name=substr($0, 2); printf name"\t1\t"} $0 !~ "^>" {printf length($0)"\t"name"\n"}'

If each fasta sequence spans several lines, substitute the awk script by:

BEGIN{totallen=-1;} $0 ~ "^>" {if (totallen!=-1) print totallen"\t"name; name=substr($0, 2); printf name"\t1\t"; totallen=0} $0 !~ "^>" {totallen=totallen+length($0);} END{if (totallen!=-1) print totallen"\t"name;}'
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by cpcantalapiedra140

Doesn't bed format start with zero as the first base?

ADD REPLYlink written 3.4 years ago by KevinL20
1
gravatar for noahaus
2.0 years ago by
noahaus10
noahaus10 wrote:

Old question, but in the spirit of good science I will post a script that takes any genome fasta file and creates a genome BED file. Very niche but I don't think there is a good converter out there quite yet.

https://github.com/noahaus/Micellaneous-Tools/blob/master/genome2bed.py

I'd read the comments in the script before beginning.

ADD COMMENTlink written 2.0 years ago by noahaus10
1

Many thanks for this. Strange as it may seem, I've been looking for a simple way to do this for a while. There were a couple of minor issues with the script (the annotation length includes line feeds, and the last line fails to be included unless there's an extra line feed at the end - these may be OS specific) but it does the job perfectly.

ADD REPLYlink written 24 months ago by mah110
2

You can accomplish this using faidx -i bed genome.fa > out.bed. For more details you can check out the documentation: https://github.com/mdshw5/pyfaidx#cli-script-faidx

ADD REPLYlink written 24 months ago by Matt Shirley9.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1626 users visited in the last hour