Question: how to read .fai and .gz file
gravatar for tommaso.gastaldi
7 days ago by
tommaso.gastaldi0 wrote:

Hello to everyone, I'm new in the big world of bioinformatic. I downloaded two files sent by the Core Facility with which I will perform a RNAseq. They have these two formats I've never seen before: FAI and .GZ. Are there available softwares (possibly free or online ones) to open these files?

Thanks a lot for the help!

rna-seq • 86 views
ADD COMMENTlink modified 7 days ago by WouterDeCoster35k • written 7 days ago by tommaso.gastaldi0

Curious as to why you received those two types of files. While .gz is normal for sequence (and some derived data) there is no need to supply .fai files unless the data was already aligned (there should be a .fa sequence file to go with that as well). If the data was already aligned then they should provide .bam for actual alignments and .bam.bai for the indexes that go with those alignments as well.

ADD REPLYlink modified 7 days ago • written 7 days ago by genomax60k
gravatar for WouterDeCoster
7 days ago by
WouterDeCoster35k wrote:

FAI is the index of a fasta file. It is unlikely to contain any data of interest but some tools need it.

GZ is gzipped: it's a compression algorithm to make sure the files take less space on disk. Therefore this extension doesn't say anything about what is inside. Likely you have received .fastq.gz files, which are compressed fastq files.

Please elaborate on which files you got and what you aim to do with those.

ADD COMMENTlink written 7 days ago by WouterDeCoster35k
gravatar for finswimmer
7 days ago by
finswimmer8.9k wrote:

.fai is propably the index file for the corresponding fasta file. You can just open it in any text editor you like. This file alone will be useless. Lot's of program that deal with fasta files use this additional file for faster access to specific sequences.

.gz is a compression format. In bioinformatic this is used to compress different file formats like vcf, fastq, fasta and other. Doing this it saves space and programs that use this file as input can have random access (if the gzip version is blocked gzip, which is usually used in the field of bionformatic). Without knowing the exact filename no one can predict what the content of this file. If you can work with the shell just type zless myfile.gz to have a look into the file.

fin swimmer

ADD COMMENTlink modified 7 days ago • written 7 days ago by finswimmer8.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1176 users visited in the last hour