Deciding on NN model layers and units

anon88576143 · February 13, 2026, 11:42am

When starting a new NN model for image classification how does one start building the model in terms of the number of layers and the number of units in each layer and the respective activation functions?

sanjaypsachdev · February 13, 2026, 4:37pm

Starting a Neural Network model from scratch can feel a bit overwhelming due to the sheer number of tunable parameters. For image classification, the standard approach is to lean on Convolutional Neural Networks (CNNs) rather than standard dense networks, as they are specifically designed to understand spatial relationships.

1. “Standard” Starting Architecture for Image Classification

A typical baseline architecture for image classification is composed of two main stages: feature extraction and classification.

A. Convolutional Layers (The Feature Extractors)

These layers scan the image to identify edges, textures, and shapes.

Number of Layers: Start with 2 to 4 convolutional blocks. Each block typically consists of a Convolution layer, an Activation layer, and a Pooling layer.
Filters (Units): Start small and double the number as you go deeper (e.g., 32 → 64 → 128).
- Reasoning: As the network deepens, the image resolution shrinks due to pooling, but the complexity of the features increases. Adding more filters allows the model to capture this higher-level complexity within the input images.
Kernel Size: 3 x 3 is a good default. It is computationally efficient while still capturing local patterns effectively.

B. Dense Layers (The Classifier)

These layers take the features extracted by the convolutional layers and make the final classification decision.

Number of Layers: Usually 1 or 2 hidden dense layers are placed after flattening the convolutional output from the final convolutional unit.
Units: Common choices are 128, 256, or 512.
- Note: If this number is too high, the model is likely to memorize the training data rather than learning generalizable patterns (overfitting).

2. Choosing Activation Functions

Activation functions introduce non-linearity, enabling the model to learn complex data patterns.

Layer Type	Recommended Activation	Why?
Hidden Layers	ReLU (Rectified Linear Unit)	It is the industry standard used with the NN intermediate layers (hidden). It outputs the input value if the input is positive and zero otherwise.
Output (Binary Classification)	Sigmoid	Squashes output strictly between 0 and 1.
Output (Multi-class Classification)	Softmax	Turns outputs for multiple classes into a probability distribution that sums to 1

Let me know if you need any further explanation or clarification.

anon88576143 · February 13, 2026, 4:49pm

Thanks. Lots of great advice there!

balaji.ambresh · February 13, 2026, 6:07pm

Please see keras_tuner as well.

ai_curious · February 13, 2026, 7:53pm

You might also get ideas from reviewing architectures listed on the web at places like this…

Your choice will also be influenced by non- functional requirements such as where the model needs to run, what throughput you need to achieve, and the cost/benefit of your truth table.

anon88576143 · February 14, 2026, 7:03am

Thank you!

What other things do I need to consider when building a model from scratch, developing it and testing it as a ML Engineer in industry? For example I have heard of JAX, TF-serving and XLA/MLIR is one job description.

sanjaypsachdev · February 14, 2026, 2:17pm

You don’t have to always build a model from scratch, depending on your use-case sometimes it may make sense to fine-tune an existing off-the-shelf model which has a use case similar to yours. Building a model generally requires a fair bit of experimentation and testing with varied datasets, architectures and hyper-parameters to get it to work. You have to try different combinations and see what works for you.

Talking about JAX at a high level, it is a deep learning library like PyTorch and TensorFlow. TF-Serving is a system for deploying pre-trained models (primarily TensorFlow) in production for inference. XLA at a high level, optimizes machine learning (ML) models, accelerating training and inference.

anon88576143 · February 14, 2026, 2:49pm

OK, thanks.

Are there any other tools or libraries I should be aware of when building and testing a production NN model for an employer as an employee ML Engineer who has never worked for an employer in this field before?

sanjaypsachdev · February 14, 2026, 5:16pm

Pick up any one of the deep learning libraries to master, these days PyTorch is the one that is en vogue. Other options are TensorFlow and JAX.
As far as platforms for managing the ML lifecycle, including experiment tracking, model versioning, and deployment you can consider MLFlow, it is an open-source platform for managing the ML lifecycle.
For model serving and deployment you can consider one of the cloud based managed services like AWS SageMaker / Bedrock (LLMs), Google’s VertexAI or Azure ML / Azure OpenAI Service (LLMs).
There are also other plethora of options and tools out there for managing the entire ML lifecycle.
You can also consider KubeFlow / Seldon Core, if you have some Kubernetes background.

anon88576143 · February 14, 2026, 5:33pm

Thanks. Have you any experience building, developing and testing a NN model in an working environment for an American employer?

sanjaypsachdev · February 15, 2026, 6:07am

I do have experience building, developing and testing a NN model in a working environment but not for an American employer.

gent.spah · February 15, 2026, 7:56am

Similar, I would try and find a model which has a similar application to yours, no need to reinvent the wheel again!

anon88576143 · February 15, 2026, 10:50am

What were key takeaways for you from building your first model for an employer?

anon88576143 · February 15, 2026, 3:57pm

Please explain what these different layer types are;

a Convolution layer
an Activation layer
a Pooling layer.

ai_curious · February 15, 2026, 9:02pm

I will suggest that the most important feature of your first model is not a feature of your first model. It’s a feature of the business question you are being asked to address in your first model. If you are being asked to colonize Mars, or end cancer, your project will fail. Even if you are being asked something that seems specific and achievable in a reasonable time, say, reduce hospital readmissions, you cannot succeed. That is not something a machine learning model can accomplish. What you might be able to do is predict likelihood of hospital readmission for a given patient, or the likelihood of sepsis onset, or structural failure of a part. It presumes that sufficient training data exists and you can quantify what success means.

When I was doing machine learning projects for real US employers with US and International customers, the biggest risk to project success was unrealistic expectations. Don’t start out trying to solve an extraordinarily hard problem at 100% accuracy, something exceeding expert human performance under the best conditions. Rather, be willing to achieve a provable, modest success with a modest amount of resources, then iterate. There is typically more than one way to attack a well defined business problem, but there are no technical solutions to one that is poorly defined. So pick a technology or platform, or work with the one your customer is already building on top of. Then, as suggested above, building incrementally off of a proven success is likely a faster path to value than big bang invention of a totally novel approach to a gnarly problem.

sanjaypsachdev · February 16, 2026, 2:02am

Convolutional Neural Networks (CNNs) are the backbone of modern computer vision. By mimicking the way the human visual cortex processes information, they break down complex images into manageable, hierarchical features.

Below is a brief explaination what these different layer types are:

1. Convolutional Layer

The Convolutional Layer is the engine of the network. Instead of processing an entire image as one flat list of pixels, it focuses on small, local regions to preserve spatial relationships.

The Mechanism: A small matrix, known as a filter or kernel (e.g., 3 \\times 3), slides across the input image. At each stop, it performs a mathematical operation (element-wise multiplication and summation) to create a Feature Map.
Hierarchical Learning: * Early Layers: Detect simple patterns like horizontal edges or color gradients.
- Deeper Layers: Combine simple patterns to recognize complex shapes, such as eyes, wheels, or entire faces.

2. Activation Layer (ReLU)

The Activation Layer acts as a gatekeeper, deciding which information is important enough to pass forward.

The “Why”: Real-world data is messy and non-linear. Without an activation layer, the entire network would behave like one giant linear equation, making it unable to learn complex patterns.
ReLU (Rectified Linear Unit): This is the industry standard. It follows a simple rule:
- If the input is negative, it becomes 0.
- If the input is positive, it stays the same.
Other Types: * Sigmoid/Tanh: Often used in specific layers for probability.
- Softmax: Typically the final layer used to output multiclass probabilities.

3. Pooling Layer

The Pooling Layer is responsible for “downsampling.” It shrinks the image dimensions to make the data more manageable.

Max Pooling: The most common method. It looks at a window (e.g., 2 \\times 2) and retains only the highest value, discarding the rest.
Key Benefits:
- Efficiency: Reduces the number of parameters and computation time.
- Translation Invariance: Helps the network recognize an object even if it is slightly tilted or shifted.
- Prevents Overfitting: By simplifying the data, the model focuses on the most prominent features rather than noise.

The CNN Workflow

A typical CNN is built by stacking these layers in a repetitive cycle:

Step	Layer	Purpose
1	Convolution	Feature Extraction (Finding the patterns).
2	Activation	Introducing Non-linearity (Allowing for complexity).
3	Pooling	Spatial Reduction (Simplifying the data).

By repeating this “sandwich” many times, the network evolves from “seeing” pixels to “understanding” objects.

anon88576143 · February 16, 2026, 11:52am

Thanks that’s really helpful.

I am a complete beginner to this but I have completed the MLS with 100% in all grades exercises and I am part through the DLS.

My aim is to complete the DLS then demonstrate my knowledge from these courses by building an image classifier NN which is trained from a small input dataset of 2000 plant leaf pictures for 5 different plant species and see if I can train it to predict one of those plant leaf species and use this exercise to demonstrate to employers a real- world example of multi class image classification.

Is 2000 images as an input training dataset large enough? I will actually be using 1000 original different plant leaf images but doubling the size of the total input training dataset by applying data augmentation by flipping each image from left to right.

Should I also perform z-score normalisation on each input pixel feature for every image?

ai_curious · February 16, 2026, 5:22pm

Always hard to apply a simple rule to this question. My thought is it is enough to do a simple model with decent results. However, don’t overlook that you need to split your data between train and test and that you have 5 classes. So you are really talking about a few hundred of each class to train on. That is very small for a real world example. It also means you can likely hold the entire training set data structure in memory at runtime, which simplifies your life but avoids mastering another real world challenge of handling large data. Following my suggestion above, maybe start with 2K but then try to scale.

Also, when you get to that point, don’t overlook class imbalance. 400 of each class trains differently than 1800 of one class and 50 each of the rest.

EDIT

Here’s a related thread with some thoughts about starting from scratch vs starting with a known baseline…

anon88576143 · February 16, 2026, 5:27pm

6h

Thanks that’s really helpful.

I am a complete beginner to this but I have completed the MLS with 100% in all grades exercises and I am part through the DLS.

My aim is to complete the DLS then demonstrate my knowledge from these courses by building an image classifier NN which is trained from a small input dataset of 2000 plant leaf pictures for 5 different plant species and see if I can train it to predict one of those plant leaf species and use this exercise to demonstrate to employers a real- world example of multi class image classification.

Is 2000 images as an input training dataset large enough? I will actually be using 1000 original different plant leaf images but doubling the size of the total input training dataset by applying data augmentation by flipping each image from left to right.

Should I also perform z-score normalisation on each input pixel feature for every image?

sanjaypsachdev · February 17, 2026, 1:04am

In the world of DL, 2000 images is considered a small dataset. However, “small” doesn’t mean “impossible”. With 5 species, you have 400 images per class (200 original + 200 flipped). If you were training a massive architecture from scratch, this wouldn’t be enough—the model would likely just memorize your training set (overfit). Flipping is a great start for augmenting your dataset, but don’t stop there! Since leaves can be at any angle or lighting, try adding random rotations, brightness adjustments, etc, in short try to generate more synthetic data.
Regarding if you should also perform z-score normalization on each input pixel feature for every image. The short answer is - not usually for images. For images, we typically use a simpler approach like - Min-Max Scaling, Mean Subtraction. While you can use z-score, standardizing 0–1 is computationally faster and usually sufficient for the activation functions (like ReLU) used in CNNs.
You can also focus on other real-world elements in your project like: validation split, confusion matrix, etc

Topic		Replies	Views
How to decide number of Layers and Units in a Layer? Advanced Learning Algorithms week-module-1	20	2602	March 19, 2024
What's happening in the hidden layers? Advanced Learning Algorithms week-module-2	15	570	September 5, 2024
Image patches, hidden conv "units" and more Convolutional Neural Networks coursera-platform	7	570	September 14, 2023
Why 25 units in layer 2 and 15 units in layer 3 for the digital classification model? Advanced Learning Algorithms week-module-1	2	534	April 23, 2023
Choosing Neural Network architecture Advanced Learning Algorithms week-module-1	3	401	November 13, 2023

Deciding on NN model layers and units

1. “Standard” Starting Architecture for Image Classification

A. Convolutional Layers (The Feature Extractors)

B. Dense Layers (The Classifier)

2. Choosing Activation Functions

1. Convolutional Layer

2. Activation Layer (ReLU)

3. Pooling Layer

The CNN Workflow

Related topics