Question: Can You Please Tell Me Where I Find Information About .Fai File Format?
7
gravatar for Biomed
10.5 years ago by
Biomed4.7k
Bethesda, MD, USA
Biomed4.7k wrote:

I am playing with GATK but the web site states fai as the ref file format however I have access to reference alignments as maf files. Can you help me with this?

maf gatk • 21k views
ADD COMMENTlink modified 16 months ago by John Marshall2.1k • written 10.5 years ago by Biomed4.7k
14
gravatar for Madelaine Gogol
7.8 years ago by
Madelaine Gogol5.2k
Kansas City
Madelaine Gogol5.2k wrote:

Purely for my future self if I google this again, the columns of a .fai file appear to be:

  • chromosome name
  • chromosome length
  • offset of the first base of the chromosome sequence in the file
  • length of the fasta lines
  • some other length of the fasta lines called "line_blen" in the source code? Appears to typically (for me) be length of fasta line + 1.

ETA: Oh, Pierre already answered this over here. blen is number of bytes in each fasta line.

ADD COMMENTlink modified 2.2 years ago by _r_am30k • written 7.8 years ago by Madelaine Gogol5.2k
12
gravatar for brentp
10.5 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

that is an index of your fasta file. have a look at samtools:

Once installed, you can create an index of some.fasta as

samtools faidx some.fasta

this will create some.fasta.fai

and have a look here where it describes how to set up your data for GATK.

ADD COMMENTlink modified 2.2 years ago by _r_am30k • written 10.5 years ago by brentp23k

Is there a way to do this with picard or GATK itself? I'd love to stay away from samtools if I could.

ADD REPLYlink written 8.3 years ago by mylons130
7
gravatar for Pierre Lindenbaum
10.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

The FAIDX file is created by samtools faidx. The FAIDX file contains, among other things, the

  • name of the reference sequence (chr1, chr2...)
  • the offset of the first base of this sequence in the file
  • the length of the FASTA lines

with this information, samtools can quickly access any region of the genome.

See also this post I wrote about faidx

ADD COMMENTlink modified 2.2 years ago by _r_am30k • written 10.5 years ago by Pierre Lindenbaum131k

Dear Pierre,

So if I understand right - if I need bed file from my fa.fai I can do just

awk 'OFS="\t" {print $1,$3,$2}' in.fai

? Thank you so much.

ADD REPLYlink modified 2.2 years ago by _r_am30k • written 6.0 years ago by Paul1.4k
2
gravatar for John Marshall
16 months ago by
John Marshall2.1k
Glasgow, Scotland
John Marshall2.1k wrote:

HTSlib has provided a manual page describing this format for a few years now. See man 5 faidx, also on the web at faidx(5) manual page.

ADD COMMENTlink written 16 months ago by John Marshall2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour