How to Implement a Neural Network in TensorFlow
Neural networks are the foundation of modern deep learning, and TensorFlow makes implementing them accessible without sacrificing power or flexibility. In this guide, you'll build a complete neural...
Key Insights
- TensorFlow’s Keras API makes building neural networks straightforward with Sequential models and Dense layers, allowing you to create production-ready models in under 50 lines of code
- Proper data preprocessing (normalization and reshaping) is critical—neural networks expect input in specific formats and normalized values train significantly faster than raw pixel data
- Start simple with a basic architecture and systematically add complexity (dropout, additional layers) only after establishing a working baseline that you can measure improvements against
Introduction & Prerequisites
Neural networks are the foundation of modern deep learning, and TensorFlow makes implementing them accessible without sacrificing power or flexibility. In this guide, you’ll build a complete neural network from scratch that classifies handwritten digits with over 95% accuracy.
Before starting, you need TensorFlow 2.x installed. The Keras API is now integrated directly into TensorFlow, so you get a clean, consistent interface. You’ll also need NumPy for array operations and matplotlib for visualization.
Install the required packages:
pip install tensorflow numpy matplotlib
Verify your installation:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU available: {tf.config.list_physical_devices('GPU')}")
You should see TensorFlow 2.x. GPU support is optional but recommended for larger projects. For this tutorial, CPU is perfectly adequate.
The basic ML concepts you need: neural networks learn by adjusting weights through backpropagation, training happens in epochs (complete passes through the data), and we split data into training and test sets to evaluate generalization.
Preparing the Dataset
We’ll use the Fashion-MNIST dataset—10 categories of clothing items, each a 28x28 grayscale image. It’s more interesting than handwritten digits while remaining simple enough to train quickly.
from tensorflow.keras.datasets import fashion_mnist
# Load the dataset
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Image shape: {X_train.shape[1:]}")
print(f"Pixel value range: {X_train.min()} to {X_train.max()}")
This gives you 60,000 training images and 10,000 test images. Each image is 28x28 pixels with values from 0 to 255.
Neural networks train better with normalized inputs. Divide by 255.0 to scale pixels to the 0-1 range:
# Normalize pixel values to 0-1 range
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
# Flatten images from 28x28 to 784-length vectors
X_train_flat = X_train.reshape(-1, 28 * 28)
X_test_flat = X_test.reshape(-1, 28 * 28)
print(f"Flattened shape: {X_train_flat.shape}")
We flatten the 2D images into 1D vectors because fully-connected networks expect 1D input. Convolutional networks preserve 2D structure, but that’s a topic for another article.
The labels are integers 0-9. For categorical classification, convert them to one-hot encoded vectors:
from tensorflow.keras.utils import to_categorical
y_train_cat = to_categorical(y_train, 10)
y_test_cat = to_categorical(y_test, 10)
print(f"Original label: {y_train[0]}")
print(f"One-hot encoded: {y_train_cat[0]}")
Building the Neural Network Architecture
TensorFlow’s Sequential API lets you stack layers linearly. This is perfect for feedforward networks where data flows straight through without branches.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
model.summary()
Let’s break this down:
The first Dense layer has 128 neurons and uses ReLU activation. The input_shape=(784,) tells TensorFlow to expect flattened 28x28 images. ReLU (Rectified Linear Unit) outputs max(0, x), introducing non-linearity that lets the network learn complex patterns.
The second hidden layer has 64 neurons, also with ReLU. The neuron count is arbitrary—64 to 128 is typical for this dataset size. More neurons increase capacity but also training time and overfitting risk.
The output layer has 10 neurons (one per class) with softmax activation. Softmax converts raw outputs to probabilities that sum to 1.0, perfect for multi-class classification.
The model.summary() shows you the architecture and parameter count. You should see around 100,000 trainable parameters—that’s 784×128 + 128×64 + 64×10, plus biases.
Compiling the Model
Compilation configures the learning process. You specify three key components:
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
Optimizer: Adam is the default choice for most problems. It adapts learning rates automatically and handles sparse gradients well. Alternatives like SGD require more tuning.
Loss function: Categorical crossentropy measures how far predictions are from true labels. It’s the standard for multi-class classification with one-hot encoded labels. If you kept integer labels, use sparse_categorical_crossentropy instead.
Metrics: Accuracy tells you the percentage of correct predictions. It’s interpretable and sufficient for balanced datasets.
You can customize the optimizer:
from tensorflow.keras.optimizers import Adam
model.compile(
optimizer=Adam(learning_rate=0.001),
loss='categorical_crossentropy',
metrics=['accuracy']
)
The default learning rate of 0.001 works well. Lower values train slower but more stably; higher values risk overshooting optima.
Training the Model
Training adjusts weights to minimize loss. The fit() method handles the entire process:
history = model.fit(
X_train_flat,
y_train_cat,
epochs=10,
batch_size=32,
validation_split=0.2,
verbose=1
)
Epochs: One epoch processes the entire training set once. 10 epochs is usually enough to see convergence on Fashion-MNIST.
Batch size: Instead of updating weights after every sample, we process 32 samples at once. This speeds up training and provides more stable gradients. Common values are 32, 64, or 128.
Validation split: Hold out 20% of training data to monitor overfitting. If training accuracy increases but validation accuracy plateaus, you’re overfitting.
The output shows progress:
Epoch 1/10
1500/1500 [==============================] - 3s 2ms/step - loss: 0.5234 - accuracy: 0.8145 - val_loss: 0.4201 - val_accuracy: 0.8523
Epoch 2/10
1500/1500 [==============================] - 2s 1ms/step - loss: 0.3912 - accuracy: 0.8591 - val_loss: 0.3845 - val_accuracy: 0.8632
...
Watch for validation accuracy improving alongside training accuracy. If validation accuracy stops improving while training accuracy keeps rising, stop training to avoid overfitting.
The history object contains metrics from each epoch:
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Evaluating & Making Predictions
After training, evaluate on the test set—data the model has never seen:
test_loss, test_accuracy = model.evaluate(X_test_flat, y_test_cat)
print(f"Test accuracy: {test_accuracy:.4f}")
You should see around 87-89% accuracy. Not state-of-the-art, but respectable for a simple feedforward network.
Make predictions on individual samples:
# Predict on first 5 test images
predictions = model.predict(X_test_flat[:5])
# Get class with highest probability
predicted_classes = np.argmax(predictions, axis=1)
true_classes = y_test[:5]
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
for i in range(5):
print(f"True: {class_names[true_classes[i]]}, "
f"Predicted: {class_names[predicted_classes[i]]}, "
f"Confidence: {predictions[i][predicted_classes[i]]:.2f}")
Visualize predictions:
fig, axes = plt.subplots(1, 5, figsize=(15, 3))
for i, ax in enumerate(axes):
ax.imshow(X_test[i], cmap='gray')
ax.set_title(f"True: {class_names[true_classes[i]]}\n"
f"Pred: {class_names[predicted_classes[i]]}")
ax.axis('off')
plt.show()
Next Steps & Optimization
This baseline model is functional but not optimized. Here are immediate improvements:
Add dropout to prevent overfitting. Dropout randomly disables neurons during training, forcing the network to learn robust features:
from tensorflow.keras.layers import Dropout
model = Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dropout(0.3),
Dense(64, activation='relu'),
Dropout(0.3),
Dense(10, activation='softmax')
])
Start with 0.2-0.5 dropout rate. Higher values regularize more aggressively.
Save your model to reuse it without retraining:
model.save('fashion_classifier.h5')
# Load later
from tensorflow.keras.models import load_model
loaded_model = load_model('fashion_classifier.h5')
Experiment with architecture: Try different layer counts, neuron counts, and activation functions. Use callbacks like EarlyStopping to halt training when validation accuracy stops improving:
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=3)
model.fit(X_train_flat, y_train_cat,
epochs=50,
validation_split=0.2,
callbacks=[early_stop])
Consider convolutional layers for image data. CNNs preserve spatial structure and typically outperform fully-connected networks on vision tasks.
The key to improvement is systematic experimentation. Change one variable at a time, measure the impact, and keep what works. Start with this baseline, establish your metrics, then iterate toward your accuracy targets.