Question: how to read .fai and .gz file
0
gravatar for tommaso.gastaldi
3 months ago by
tommaso.gastaldi0 wrote:

Hello to everyone, I'm new in the big world of bioinformatic. I downloaded two files sent by the Core Facility with which I will perform a RNAseq. They have these two formats I've never seen before: FAI and .GZ. Are there available softwares (possibly free or online ones) to open these files?

Thanks a lot for the help!

rna-seq • 207 views
ADD COMMENTlink modified 3 months ago by WouterDeCoster38k • written 3 months ago by tommaso.gastaldi0
2

Curious as to why you received those two types of files. While .gz is normal for sequence (and some derived data) there is no need to supply .fai files unless the data was already aligned (there should be a .fa sequence file to go with that as well). If the data was already aligned then they should provide .bam for actual alignments and .bam.bai for the indexes that go with those alignments as well.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax65k
3
gravatar for WouterDeCoster
3 months ago by
Belgium
WouterDeCoster38k wrote:

FAI is the index of a fasta file. It is unlikely to contain any data of interest but some tools need it.

GZ is gzipped: it's a compression algorithm to make sure the files take less space on disk. Therefore this extension doesn't say anything about what is inside. Likely you have received .fastq.gz files, which are compressed fastq files.

Please elaborate on which files you got and what you aim to do with those.

ADD COMMENTlink written 3 months ago by WouterDeCoster38k
3
gravatar for finswimmer
3 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

.fai is propably the index file for the corresponding fasta file. You can just open it in any text editor you like. This file alone will be useless. Lot's of program that deal with fasta files use this additional file for faster access to specific sequences.

.gz is a compression format. In bioinformatic this is used to compress different file formats like vcf, fastq, fasta and other. Doing this it saves space and programs that use this file as input can have random access (if the gzip version is blocked gzip, which is usually used in the field of bionformatic). Without knowing the exact filename no one can predict what the content of this file. If you can work with the shell just type zless myfile.gz to have a look into the file.

fin swimmer

ADD COMMENTlink modified 3 months ago • written 3 months ago by finswimmer11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1056 users visited in the last hour