How to Use Keras Sequential API in TensorFlow

The Keras Sequential API is the most straightforward way to build neural networks in TensorFlow. It's designed for models where data flows linearly through a stack of layers—input goes through layer...

Key Insights

  • The Sequential API is ideal for building neural networks with a single linear path from input to output—use it for 90% of standard architectures and switch to the Functional API only when you need multiple inputs, outputs, or complex branching.
  • Always compile your model with an optimizer, loss function, and metrics that match your problem type: use categorical_crossentropy for multi-class classification, binary_crossentropy for binary problems, and mse for regression.
  • Call model.summary() before training to verify your architecture’s layer shapes and parameter counts—catching dimension mismatches early saves hours of debugging.

Introduction to the Sequential API

The Keras Sequential API is the most straightforward way to build neural networks in TensorFlow. It’s designed for models where data flows linearly through a stack of layers—input goes through layer 1, then layer 2, and so on until reaching the output. This covers the vast majority of neural network architectures you’ll build in practice.

Use the Sequential API when your model has a single input tensor and a single output tensor, with layers connected in sequence. This includes feedforward networks, convolutional neural networks for image classification, and recurrent networks for sequence processing. Switch to the Functional API only when you need multiple inputs (like combining image and text data), multiple outputs (like multi-task learning), or layer sharing and branching (like in ResNet or Inception architectures).

The Sequential model is part of TensorFlow’s high-level Keras API, which means you get clean, readable code without sacrificing performance. Let’s build some models.

Setting Up Your First Sequential Model

Start by importing TensorFlow and the Sequential model class. You’ll also need specific layer types depending on your architecture.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

# Create an empty Sequential model
model = Sequential()

This creates an empty model container. You’ll add layers to it next. The model doesn’t have any defined architecture yet—it’s just an empty shell waiting for layers.

Adding Layers to Sequential Models

There are two ways to add layers to a Sequential model. You can pass a list of layers to the constructor, or you can use the .add() method to append layers one at a time. Both approaches produce identical models, so choose based on readability.

Here’s a simple feedforward network for classifying images into 10 categories:

# Method 1: Pass layers as a list to constructor
model = Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Method 2: Use .add() to build incrementally
model = Sequential()
model.add(layers.Flatten(input_shape=(28, 28)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

The first layer must specify the input_shape parameter so TensorFlow knows the dimensions of your input data. Subsequent layers automatically infer their input shape from the previous layer’s output.

In this example, Flatten converts 28×28 images into 784-element vectors. The Dense layers are fully-connected layers with ReLU activation for hidden layers and softmax for the output layer (which gives us probability distributions over 10 classes). Dropout randomly sets 20% of inputs to zero during training, which helps prevent overfitting.

For convolutional networks, you’d use Conv2D and MaxPooling2D layers. For sequence models, you’d use LSTM or GRU layers. The Sequential API works the same way regardless of layer type.

Compiling the Model

Before training, you must compile the model by specifying three things: an optimizer, a loss function, and metrics to track during training.

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

The optimizer controls how the model updates its weights during training. Adam is a safe default choice that works well across most problems. You can also use 'sgd', 'rmsprop', or create optimizer objects with custom learning rates:

model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

The loss function depends on your problem type:

  • Multi-class classification: 'categorical_crossentropy' (with one-hot encoded labels) or 'sparse_categorical_crossentropy' (with integer labels)
  • Binary classification: 'binary_crossentropy'
  • Regression: 'mse' (mean squared error) or 'mae' (mean absolute error)

Metrics are what you monitor during training. Accuracy is standard for classification, but you can track multiple metrics: metrics=['accuracy', 'precision', 'recall'].

Training and Evaluating

The .fit() method trains the model on your data. Here’s a complete example using the MNIST handwritten digit dataset:

# Load and preprocess MNIST data
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# Convert labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model
model = Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Train the model
history = model.fit(
    x_train, y_train,
    batch_size=32,
    epochs=10,
    validation_split=0.2,
    verbose=1
)

# Evaluate on test data
test_loss, test_accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f'Test accuracy: {test_accuracy:.4f}')

The fit() method takes your training data and labels, along with several important parameters:

  • batch_size: Number of samples processed before updating weights (32 is a common default)
  • epochs: Number of complete passes through the training data
  • validation_split: Fraction of training data to use for validation (0.2 means 20%)

The method returns a history object containing training metrics for each epoch, which you can plot to visualize learning curves.

The evaluate() method computes loss and metrics on test data without updating weights. This gives you an unbiased estimate of model performance.

Making Predictions

After training, use .predict() to make predictions on new data:

# Predict on the first 5 test images
predictions = model.predict(x_test[:5])

# predictions is an array of shape (5, 10) containing probabilities
for i, pred in enumerate(predictions):
    predicted_class = pred.argmax()
    actual_class = y_test[i].argmax()
    confidence = pred[predicted_class]
    print(f'Image {i}: Predicted {predicted_class} '
          f'(confidence: {confidence:.2f}), Actual: {actual_class}')

The .predict() method returns raw model outputs. For classification with softmax activation, this means probability distributions over classes. Use .argmax() to get the predicted class index.

For binary classification with sigmoid activation, predictions are single probabilities between 0 and 1. Threshold at 0.5 to get class labels:

predictions = model.predict(x_test)
binary_predictions = (predictions > 0.5).astype(int)

Best Practices and Common Patterns

Always call model.summary() before training to verify your architecture:

model.summary()

This displays each layer’s output shape and parameter count. It’s invaluable for catching dimension mismatches and understanding model complexity.

Use callbacks to add functionality during training. The most useful is ModelCheckpoint, which saves the best model during training:

from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping

callbacks = [
    ModelCheckpoint(
        'best_model.h5',
        monitor='val_accuracy',
        save_best_only=True,
        mode='max'
    ),
    EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    )
]

model.fit(
    x_train, y_train,
    epochs=50,
    validation_split=0.2,
    callbacks=callbacks
)

EarlyStopping halts training when validation loss stops improving, preventing overfitting and saving time.

Save and load models using:

# Save entire model (architecture + weights)
model.save('my_model.h5')

# Load model
loaded_model = keras.models.load_model('my_model.h5')

Know when to graduate to the Functional API. If you need any of these features, Sequential won’t work:

  • Multiple inputs (e.g., combining image and metadata)
  • Multiple outputs (e.g., multi-task learning)
  • Shared layers (using the same layer instance multiple times)
  • Non-linear topology (skip connections, concatenations)

For everything else, stick with Sequential. It’s cleaner, more readable, and perfectly capable of building powerful models. Most production deep learning systems use straightforward Sequential architectures—complexity in model topology rarely beats good data and proper training procedures.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.