Hi everybody, I have a problem with two files downloaded from Ensembl FTP
Homo_sapiens.GRCh38.pep.all.fa.gz - http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/pep/
Homo_sapiens.GRCh38.104.chr.gtf.gz - http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/
I need to open it on Windows system. I tried with Word and WordPad. But it seems that the encoding is not recognized. Indeed, Word suggests a list of possible encoding when I try to open the files. But none of them is suitable to be used to translate the files in a readable format.
I also tried to open them with a Python script but I get always the same error
def file_head(file_name, number_of_lines, encode="utf8"): file_hand = open(file_name, 'r', encoding=encode) for i,line in enumerate(file_hand): print(line) if i > number_of_lines: break file_hand.close() # ------------ MAIN -------------- filename = 'myfasta.fasta' file_head(filename, 50)
The error message is always like that:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I think that these files from Ensembl are used a lot by researchers. But I did not find any valid solution on the web. I do not know where I mistake.
Thank you in advance for your help.