Question

how to read .fai and .gz file

0

Entering edit mode

5.3 years ago

tommaso.gastaldi ▴ 20

Hello to everyone, I'm new in the big world of bioinformatic. I downloaded two files sent by the Core Facility with which I will perform a RNAseq. They have these two formats I've never seen before: FAI and .GZ. Are there available softwares (possibly free or online ones) to open these files?

Thanks a lot for the help!

RNA-Seq • 5.2k views

ADD COMMENT • link updated 5.3 years ago by WouterDeCoster 47k • written 5.3 years ago by tommaso.gastaldi ▴ 20

2

Entering edit mode

Curious as to why you received those two types of files. While .gz is normal for sequence (and some derived data) there is no need to supply .fai files unless the data was already aligned (there should be a .fa sequence file to go with that as well). If the data was already aligned then they should provide .bam for actual alignments and .bam.bai for the indexes that go with those alignments as well.

ADD REPLY • link 5.3 years ago by GenoMax 141k

score 3 · Answer 1 · 2019-01-10

FAI is the index of a fasta file. It is unlikely to contain any data of interest but some tools need it.

GZ is gzipped: it's a compression algorithm to make sure the files take less space on disk. Therefore this extension doesn't say anything about what is inside. Likely you have received .fastq.gz files, which are compressed fastq files.

Please elaborate on which files you got and what you aim to do with those.

score 3 · Answer 2 · 2019-01-10

.fai is propably the index file for the corresponding fasta file. You can just open it in any text editor you like. This file alone will be useless. Lot's of program that deal with fasta files use this additional file for faster access to specific sequences.

.gz is a compression format. In bioinformatic this is used to compress different file formats like vcf, fastq, fasta and other. Doing this it saves space and programs that use this file as input can have random access (if the gzip version is blocked gzip, which is usually used in the field of bionformatic). Without knowing the exact filename no one can predict what the content of this file. If you can work with the shell just type zless myfile.gz to have a look into the file.

fin swimmer