How to create network that has varying output size

Hello,

As in the title - what’s the simplest and most reusable across different cases way to make a network which output size can differ based on its input.

For example - I wrote a network which detects X and Y coordinates of vertices in four edged randomized shape and I want to create same thing for different shapes which obviously have different amount of them.

Things that comes to my mind:

  • Assume that there is an upper limit of vertices amount, for example 12, and the empty fields eventually would return 0. (Anyway, coordinate 0,0 is not possible)
  • I know some basics of functional API but I don’t know if there are some neat features to make this work.
  • Should I review the code of object detection or instance segmentation to search for solution? As I remember they return output with different amount of objects.

Hey @ErniW,
An interesting question!

This seems to be the first way to go indeed. However, you will find that in this case, there is no probability measure for the predicted vertices.

On the other hand, the probability measures are very well covered by the way object detection algorithms work. When Prof Andrew teaches about Object Detection, he also teaches about the concept of Anchor Boxes, which can definitely be useful for your use-case. So, do take a look at that.

I don’t think you need to exploit the functional API much for your use-case. A careful design of your labels and your input is all you need, once again, pretty much something that is well covered in Object Detection. Let me know if this helps.

Cheers,
Elemento

Thanks for your answer.

So should I classify the type of shape first, or rather edges amount and then somehow calculate the coords?

Of course I will send results of my experiments but now I’m working on code for better dataset. Probably later I will have more questions.

Now to create image I randomly choose X and Y from each quarter of target image to create shape but many times the angle between edges is so small that it looks like a triangle. I know how to fix that. Important note: it’s fascinating that the network somehow handles even small angles, but I think it affects overall accuracy which is around 94% on test dataset

I’m using simple convolutional network and MSE loss. Below are images of results. Please notice that I’ve intentionally included some inaccurate results and some of shapes that looks like a triangle. Some of my previous networks handled it better.

Hey @ErniW,

If I am not wrong, we are referring to the labels right? Assuming that, I guess the labels depend on what you want your model to do, and whether you can manually label your dataset, or you want to automatically create the labels.

For instance, if you are creating the images using the above technique, then, you will automatically get the coordinates which can serve as the labels.

Now, if you want your model to predict the number of edges, then, you will have to closely inspect the images that seem to be like triangles, cause as you said, many of them have very small angle between their edges, and even a minute angle can change the label. However, a better way of doing this is finding if any of the 3 coordinates among the 4 are collinear. If so, then 3 edges otherwise 4. Note that, in this case, your dataset will be severely imbalanced, since most of your shapes will have 4 edges (due to the way they are created).

Now, coming to the last scenario, in which we want the model to predict the shape. Once again, this scenario can be treated as 2 different ways. First, predicting among triangle and quadrilateral, second, predicting among, isoceles triangle, scalene triangle, equilateral triangle, square, rhombus, rectangle, parallelogram, etc (basically all the possible forms with 3 and 4 edges). For the first scenario, the labels would be created in the same way as we created for predicting the number of edges. For the second scenario, the label automation will be a bit more tedious, and you will have to make some calls on your own, depending on your needs. For instance, you can easily figure out the code for finding out whether a given shape is a square or not using the 4 coordinates, however, every square by definition is a rectangle, rhombus, and a parallelogram. So, do you want to create a hierarchy in your labels or do you want the model to be a multi-label classification model, depends on you.

Let me know if this helps.

Cheers,
Elemento

@ErniW , Let me add some cents :smile:
I remember that I read in one The Batch newsletter, from Andre Ng, that sometimes we may have a more straightforward solution than making use of neural network. For instance, you might use Computer Vision with OpenCV where you can just count the number of shapes for each object which will define the object itself. For example, 3 shapes is a triangle, 5 shapes is a pentagon, and so on.

Well, are you writing about number of shapes or edges? I think I don’t understand.

I wrote a network to detect what shape is on the picture few weeks ago but now I’m trying to get coordinates of each vertex it contains, not necessarily its type because the dataset won’t consist of basic shape primitives.

I agree that I could probably use OpenCV but I don’t want to do it that way.

Cheers.

Seems like a CNN to predict the number of vertices would be straightforward. Each training input is just an image containing one shape and the label is the correct number. Is that what you did? If you’re also trying to localize those vertices, not as easy. I don’t imagine a single pixel contains enough information to differentiate between edge and vertex, so image segmentation probably isn’t going to help. But maybe you could do a YOLO-type object detection with small grid cells and a label containing a coordinate and a categorical value of background, fill, edge or vertex. With only vertex contributing to the coordinates loss computation - that is using a weighting factor a la the v2 era YOLO loss function. Really jagged / short edged shapes would give it trouble - also similar to v2 era YOLO. Just the first thing that occurred to me. Maybe it helps?

@ai_curious you can see results of my network in attached image few posts above. The blue dots are the network’s prediction. I didn’t predict the number of edges in this case, just four vertices knowing that there is always four of them in my dataset.

The labels are X and Y coords of vertices so the output is array of 8 values (2 * 4). I just don’t know how to make my existing code work with unknown vertices amount.

I have to write a code for dataset which prevents from making angles lesser than 5 degrees. I will check if it improves the accuracy.

How to combine the:

  • Detect the number of edges/vertices (or without this step)
  • When I know how many of them exist, predict their coordinates

Into one single working thing.

When I started working on this it sounded trivial but it isn’t and I will need that knowledge later so every insight helps.

By the way so what’s better than “Yolo V2 method”?