Need advice on noisy signal classification project

Right now I’m working on a signal classification project and I feel that I need advice from the deep learning community as I’m not entirely sure about my “strategy”

A little explanation of a task:

data structure
The input data samples consist of two 1D components of the signal depending on the time. For instance x(t) and y(t). We can get several channels of such signal at the same time (x1(t), y1(t), x2(t), y2(t), ...). The number of channels is fixed for the classification model, e.g. if we know that we will work with 3 channels, then our model should take x1, y1, x2, y2, x3, y3 as an input.

deeper explanation of data
So for each channel we initially have 2 possible classes, say 0 and 1.
In the right picture, you can see the validation dataset for a particular channel in the x,y plane, but each sample is averaged to 1 point in space (x, y). The two of the most distinguished samples from these 2 classes are circled in black. In the left picture, you can see, that these two samples look pretty much the same visually, but in fact, vary drastically.

Here is when this becomes interesting
As you can see, despite the fact that we have data, labeled in two classes, some of our samples trespass and this is not an error! The thing is our signals can “transform” through time from one class to another, but the initial dataset labeled only with respect to the beginning of the measurement

What I want to do
I want to build a “tracking system” that will predict class belonging at the beginning and end of the measurement. For each sample within each channel, it may be one of these: (0, 0), (0, 1), (1, 0), (1, 1). For example, if we have 3 channel system we will get something like this:

[(1, 1), (0, 1), (1, 1), (0, 0)]

It seems like it is a 4^N classification task, where N is a number of channels. This fact scares me, so what I want is to make it simpler. (Maybe to use some sequence model and predict an expected class (0 or 1 in each channel) several times per forward pass?)

It is important to understand how our system changes through time because this affects our next measurements. When we take our first measurement and find out that the second channel signal has changed its class, this way we need to adjust the settings of our device to run the second measurement properly

This model should be fast, so we plan to implement it directly in FPGA. This fact restricts us to the size of the model.

What I’ve built so far
First I decided to build a simple classification model that predicts the initial signal class of each channel (just to ensure that we can distinguish such noisy data). I’ve built a 1D convolution model that takes channelized sequences as input (x1(t), y1(t), x2(t), y2(t), …) and output something like (1, 0, …)
I found out that such a technique detects classes pretty accurately, but now I need to take a step forward to make the desired model.

What I’m looking for
I need to figure out what strategy to use to solve this problem, so I’m seeking a piece of advice. Maybe you worked with a similar type of problem or something similar was encountered in the field of signal processing or even LLM.
Everything would be great:

  • architectural advice (Should I move to sequence models, or CNN is my best pick)
  • how to avoid 4^N classification?
  • just your opinion would be great too