Purely for my future self if I google this again, the columns of a .fai file appear to be:
- chromosome name
- chromosome length
- offset of the first base of the chromosome sequence in the file
- length of the fasta lines
- some other length of the fasta lines called "line_blen" in the source code? Appears to typically (for me) be length of fasta line + 1.
ETA: Oh, Pierre already answered this over here. blen is number of bytes in each fasta line.
that is an index of your fasta file. have a look at samtools:
Once installed, you can create an index of
samtools faidx some.fasta
this will create
and have a look here where it describes how to set up your data for GATK.
The FAIDX file is created by samtools faidx. The FAIDX file contains, among other things, the
- name of the reference sequence (chr1, chr2...)
- the offset of the first base of this sequence in the file
- the length of the FASTA lines
with this information, samtools can quickly access any region of the genome.
See also this post I wrote about faidx