Developing a genomics ML model using bytes?

Forum:Developing a genomics ML model using bytes?

0

Entering edit mode

15 hours ago

Pranava ▴ 10

I was looking into training a Machine Learning / Deep Learning Model using Bytes. Recently I was working on a way to decrease the size of a .fasta file using _bit shifting_ (i.e, converting one nucleotide which is normally 8 bytes and can be bought down to 4 bytes using this method)

And now that we are in the age of Machine Learning and Artificial Intelligence dominating the Industry or at least there has been a trend of that it got me thinking what if we can use the bytes to develop a model? The problem I can currently think of is it might .... might not be biologically relevant? I am not sure this is where I kinda started getting confused and Wanted to reach out on here.

Learning Machine Genomics Bytes • 1.2k views

ADD COMMENT • link updated 4 hours ago by dariober 15k • written 15 hours ago by Pranava ▴ 10

0

Entering edit mode

how about fastq.gz?

ADD REPLY • link 12 hours ago by 1769mkc ★ 1.3k

0

Entering edit mode

So essentially you work on a compression algorithm, is it? If so, be sure to bechmark your idea against the hundreds of existing and fast compression methods, be it standard things such as gzip, bzip, zstd, or genomics-centered methods such as CRAM.

might not be biologically relevant?

What does compression have to do with biology? Please explain better if I miss the point.

ADD REPLY • link 9 hours ago by ATpoint 89k

0

Entering edit mode

How is this different from Tried building a compact sequence format with 4-bit storage

ADD REPLY • link 8 hours ago by ATpoint 89k

0

Entering edit mode

what if we can use the bytes to develop a model?

My understanding is that the OP is considering building a model using the sequences in fastq files, so the compression is just an intermediate step. However, it is not clear to me what model s/he has in mind... Some sort of LLM using fastq data...?

ADD REPLY • link 4 hours ago by dariober 15k

Login before adding your answer.