Gabor Wavelet Transform on DNA sequence in python
1
1
Entering edit mode
2 days ago

Hello, I am new to signal processing and I wanted to perform gabor wavelet transform on DNA sequence to detect 3 base periodicity in coding regions using python and pytorch to perform the convolution operation and here is the code :

def gabor_transform(x,window, omega0, sigma,t1):
theta = math.pi /3
for i in range(x.shape[0]) :
if i == 0:
d = math.cos(theta) + 1j * math.sin(theta)
elif i == 1:
d = -math.cos(theta) - 1j * math.sin(theta)
elif i == 2:
d = -math.cos(theta) + 1j * math.sin(theta)
elif i == 3:
d = math.cos(theta) - 1j * math.sin(theta)
x1 = x[i,:]
t = torch.arange(-window / 2, window / 2,requires_grad=False)
g = torch.exp(-(t**2) / (2 * sigma**2)) * torch.exp(1j * omega0 * t)
X = torch.fft.fft(x1.to(dtype=torch.complex64)).unsqueeze(0).unsqueeze(0)
G = torch.fft.fft(g.to(dtype=torch.complex64)).unsqueeze(0).unsqueeze(0)
Conv = torch.nn.Conv1d(1, 1, window, stride=1, padding=int(window//2), bias=False)
Conv.weight = G
C = Conv(X).view(-1)
C = torch.abs(C)**2
total += C


with x being one hot encoded dna sequence. I wanted those who has experience with signal processing if I am doing it right

python signal-processing DNA • 209 views
0
Entering edit mode
1 day ago
LauferVA 4.4k

Cool post! Disclaimer here that I do not work directly in this field - I just have some familiarity with it due to contributions to others' projects. So please verify and experiment. Anyhow, I think you are overall on the right track, but wanted to address one conceptual change that might improve performance. Also wanted to mention a possible (non-)issue with PyTorch.

1) Adapt the fundamental unit used for encoding the nucleotide sequence to the periodicity you wish to detect.

You are using one-hot encoding to represent nucleotides (which in and of itself is valid for categorical data, but may neverthelss be suboptimal for sensitivity to any given target periodicity). Because you are looking for 3-base periodicity, the absence of inherent connectivity between nucleotides in the encoding scheme introduces a risk that the Gabor wavelet may enhance sensitivity to periodicity not of interest presently instead of maximizing sensitivity for triplet periodicity specifically unless specific steps are taken elsewhere in the algorithm... So, it may be worthwhile to explore an alternative to the 4xN matrix of a one-hot scheme. For instance, you test whether use of a sliding feature vector (where each unit corresponds to a trinucleotide) improves your results.

It is also possible that varying other parameters in coordination with use of alternative (higher order) encoding could further boost sensitivity to the target periodicity. As examples of questions I might ask myself:

• How are you making sure that your padding and stride are appropriately orchestrating alignment of your convolutions with the periodicity in your sequence?
• Are you sure your omega0 is tuned such that it will detect periodicity at the right frequency (in this case, corresponding to the triplet codon structure)? What happens if you specify 2pi/3 for omega0?

2) Issues with input expected by dependencies (pytorch):

I learned something there. By default, PyTorch tends not to support complex-valued tensors. This concerned me because the code appears to contain a complex-valued Gabor filter. However, per the torch.nn.conv1d documentation, the conv1d module specifically is an exception to this

This module supports complex data types i.e. complex32, complex64, complex128.

Having said that, if you have not done so already, it may be worth checking if any additional arguments need to be provided to ensure appropriate handling of the complex part of each value is being processed rightly (rather than, say, being mishandled by default at some step).

SciPy also has ways of supporting signal processing that can handle Gabor transforms & complex numbers, though looks like this may be fine.