⏱ 6 min read 📊 Beginner 🗓 Updated Jan 2025

⚡ TensorFlow vs Keras

TF1 → TF2: A Paradigm Shift

TensorFlow 1.x required building a static computation graph, then feeding data into a separate Session to execute it. Debugging was painful — you couldn't inspect intermediate values without explicit print ops. TF2 (released 2019) switched to eager execution: operations run immediately like regular Python code.

  • TF1: define graph → create Session → sess.run(ops) → inspect
  • TF2: operations execute immediately, results are Python values
  • tf.function — optional: trace to graph for speed
  • TF2 is the default and the only version actively developed
  • Migrate TF1 code: tf.compat.v1 compatibility layer

Keras as the High-Level API

Keras was originally a standalone library. Since TF2, it ships as tf.keras and is the official high-level API. Standalone Keras (keras 3.x) now also supports JAX and PyTorch backends. For TF-only work, tf.keras is what you use.

  • tf.keras — bundled with TensorFlow, recommended
  • Keras 3.x — multi-backend (TF, JAX, PyTorch)
  • The Model/Layer/Callback API is identical between both
  • tf.data for efficient input pipelines
  • tf.GradientTape for custom training loops

tf.Tensor vs NumPy

TensorFlow tensors live on a specific device (CPU/GPU/TPU) and are immutable. Most NumPy operations work on tensors via the numpy interop layer, and tensors can be converted to NumPy arrays when on CPU.

  • tf.constant([1,2,3]) — immutable tensor
  • tf.Variable([1,2,3]) — mutable, holds model weights
  • t.numpy() — convert to NumPy (CPU only)
  • tf.cast(t, tf.float32) — change dtype
  • Automatic device placement: tf chooses GPU if available
import tensorflow as tf
import numpy as np

# ── Verify installation ────────────────────────────────────────────────────────
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version:      {tf.keras.__version__}")
print(f"GPU available:      {len(tf.config.list_physical_devices('GPU')) > 0}")

# ── Hello-world tensor operations ─────────────────────────────────────────────
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[5.0, 6.0], [7.0, 8.0]])

print(a + b)             # element-wise add
print(a @ b)             # matrix multiply
print(tf.reduce_sum(a))  # sum all elements → 10.0
print(tf.reduce_mean(a, axis=0))  # column means → [2.0, 3.0]

# ── tf.Variable: mutable state for weights ────────────────────────────────────
W = tf.Variable(tf.random.normal([4, 3], stddev=0.1))
b_var = tf.Variable(tf.zeros([3]))
x = tf.constant(tf.random.normal([10, 4]))   # batch of 10

logits = x @ W + b_var
print(f"Logits shape: {logits.shape}")        # (10, 3)

# ── tf.GradientTape: manual gradient computation ──────────────────────────────
x_in = tf.Variable(3.0)
with tf.GradientTape() as tape:
    y = x_in ** 2 + 2 * x_in + 1    # y = (x+1)^2, dy/dx = 2x+2
grad = tape.gradient(y, x_in)
print(f"dy/dx at x=3: {grad.numpy()}")   # 8.0

# ── Interop with NumPy ────────────────────────────────────────────────────────
np_arr = np.array([1.0, 2.0, 3.0], dtype=np.float32)
tf_tensor = tf.constant(np_arr)        # NumPy → TF
back_to_np = tf_tensor.numpy()         # TF → NumPy
print(type(back_to_np), back_to_np)    # numpy.ndarray  [1. 2. 3.]

🏗 Building Models with Keras

Keras provides three model-building APIs that trade simplicity for flexibility. All produce identical tf.keras.Model objects with the same compile/fit/evaluate/predict interface.

Sequential API

Stack layers linearly. Simplest approach — ideal for most feedforward networks. Cannot express branching, shared layers, or multiple inputs/outputs.

  • tf.keras.Sequential([layer1, layer2, ...])
  • model.add(layer) — add layers incrementally
  • Layers are named automatically (dense, dense_1, ...)
  • Limitations: single input, single output, strictly linear

Functional API

Define models as a graph of layers by calling layers as functions on tensors. Enables residual connections, multi-input/output models, and shared layers. Preferred for most production models.

  • inputs = tf.keras.Input(shape=(...))
  • Call layers: x = Dense(64)(inputs)
  • model = tf.keras.Model(inputs=inputs, outputs=outputs)
  • Explicit data flow — model graph is inspectable

Model Subclassing

Subclass tf.keras.Model and define __init__ (layers) and call (forward pass). Maximum flexibility — any Python logic in the forward pass, dynamic architectures, custom loops.

  • Override call(self, inputs, training=False)
  • Use training flag to toggle Dropout/BN behaviour
  • Less visible to model.summary() for dynamic shapes
  • Required for: dynamic graphs, custom attention, research
import tensorflow as tf
import numpy as np

# ── Same MLP (2 hidden layers) in all three APIs ──────────────────────────────
INPUT_DIM = 20
HIDDEN    = 64
CLASSES   = 5

# ── 1. Sequential API ────────────────────────────────────────────────────────
model_seq = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(INPUT_DIM,)),
    tf.keras.layers.Dense(HIDDEN, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(HIDDEN // 2, activation='relu'),
    tf.keras.layers.Dense(CLASSES, activation='softmax'),
], name='sequential_mlp')

# ── 2. Functional API ────────────────────────────────────────────────────────
inputs = tf.keras.Input(shape=(INPUT_DIM,), name='features')
x = tf.keras.layers.Dense(HIDDEN, activation='relu')(inputs)
x = tf.keras.layers.Dropout(0.3)(x)
x = tf.keras.layers.Dense(HIDDEN // 2, activation='relu')(x)
outputs = tf.keras.layers.Dense(CLASSES, activation='softmax', name='predictions')(x)
model_func = tf.keras.Model(inputs=inputs, outputs=outputs, name='functional_mlp')

# ── 3. Model Subclassing ─────────────────────────────────────────────────────
class SubclassedMLP(tf.keras.Model):
    def __init__(self, hidden, classes):
        super().__init__(name='subclassed_mlp')
        self.dense1   = tf.keras.layers.Dense(hidden, activation='relu')
        self.dropout  = tf.keras.layers.Dropout(0.3)
        self.dense2   = tf.keras.layers.Dense(hidden // 2, activation='relu')
        self.out      = tf.keras.layers.Dense(classes, activation='softmax')

    def call(self, inputs, training=False):
        x = self.dense1(inputs)
        x = self.dropout(x, training=training)   # only active during training
        x = self.dense2(x)
        return self.out(x)

model_sub = SubclassedMLP(HIDDEN, CLASSES)

# All three are tf.keras.Model — same interface:
for name, model in [('Sequential', model_seq),
                    ('Functional', model_func),
                    ('Subclassed', model_sub)]:
    dummy_input = tf.random.normal((8, INPUT_DIM))
    out = model(dummy_input, training=False)
    print(f"{name}: output shape = {out.shape}")

model_func.summary()

💧 Layers, Activations & Regularisation

Key Layers

  • Dense(units, activation) — fully connected layer
  • Conv2D(filters, kernel_size, strides, padding) — spatial features
  • MaxPooling2D(pool_size) — spatial downsampling
  • GlobalAveragePooling2D() — collapse spatial dims
  • LSTM(units, return_sequences) — recurrent sequence
  • GRU(units) — gated recurrent, faster than LSTM
  • Embedding(vocab_size, embed_dim) — token embeddings
  • MultiHeadAttention(num_heads, key_dim) — transformer attention
  • Flatten() / Reshape(target_shape)

Activations

  • relu — most common, fast; dead neuron problem
  • leaky_relu / elu — fix dead neurons
  • gelu — Gaussian error linear unit; used in transformers
  • selu — self-normalising; use with lecun_normal init
  • sigmoid — binary output / gates; vanishing gradient
  • softmax — multi-class probability output
  • tanh — RNN hidden states; zero-centred
  • swish — gated activation; often better than relu

Regularisation Techniques

  • Dropout(rate) — randomly zero activations; prevents co-adaptation
  • SpatialDropout2D(rate) — drop entire feature maps
  • BatchNormalization() — normalise per batch; accelerates training
  • LayerNormalization() — normalise per sample; better for seq models
  • kernel_regularizer=tf.keras.regularizers.L2(1e-4) — weight decay
  • kernel_regularizer=tf.keras.regularizers.L1(1e-4) — sparsity
  • Gradient clipping: optimizer=Adam(clipnorm=1.0)
import tensorflow as tf

# ── CNN for Image Classification (CIFAR-10 style) ─────────────────────────────
def build_cnn(input_shape=(32, 32, 3), num_classes=10):
    inputs = tf.keras.Input(shape=input_shape, name='image')

    # Block 1
    x = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu',
                                kernel_regularizer=tf.keras.regularizers.L2(1e-4))(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(x)
    x = tf.keras.layers.MaxPooling2D((2,2))(x)
    x = tf.keras.layers.SpatialDropout2D(0.2)(x)

    # Block 2
    x = tf.keras.layers.Conv2D(64, (3,3), padding='same', activation='relu',
                                kernel_regularizer=tf.keras.regularizers.L2(1e-4))(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Conv2D(64, (3,3), padding='same', activation='relu')(x)
    x = tf.keras.layers.MaxPooling2D((2,2))(x)
    x = tf.keras.layers.SpatialDropout2D(0.3)(x)

    # Block 3
    x = tf.keras.layers.Conv2D(128, (3,3), padding='same', activation='relu')(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.GlobalAveragePooling2D()(x)   # (batch, 128)

    # Classifier head
    x = tf.keras.layers.Dense(256, activation='relu',
                               kernel_regularizer=tf.keras.regularizers.L2(1e-4))(x)
    x = tf.keras.layers.Dropout(0.5)(x)
    outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)

    return tf.keras.Model(inputs, outputs, name='cnn_classifier')

cnn = build_cnn()
cnn.summary()

# Count parameters
total_params = cnn.count_params()
print(f"Total parameters: {total_params:,}")

# Verify forward pass
dummy_batch = tf.random.normal((8, 32, 32, 3))
out = cnn(dummy_batch, training=False)
print(f"Output shape: {out.shape}")   # (8, 10)

🔥 Compiling & Training

model.compile()

Compilation configures the training procedure. Keras supports string shortcuts for common optimisers/losses, or you can pass class instances to customise hyperparameters.

  • Optimizers: adam, sgd, rmsprop, adamw
  • Losses: sparse_categorical_crossentropy (int labels)
  • Losses: categorical_crossentropy (one-hot labels)
  • Losses: binary_crossentropy, mse, mae
  • Metrics: accuracy, AUC, Precision, Recall
  • run_eagerly=True — disable tf.function for debugging

model.fit()

Runs the training loop. Keras handles batching, shuffling, validation splitting, and metric tracking automatically. Returns a History object for plotting learning curves.

  • batch_size — samples per gradient update (32-512 typical)
  • epochs — number of full passes through training data
  • validation_split=0.2 — last 20% of data as validation
  • validation_data=(X_val, y_val) — explicit validation set
  • shuffle=True — shuffle training data each epoch
  • class_weight={0: 1, 1: 10} — handle class imbalance

Callbacks

Callbacks are hooks that run at specific points during training. They enable early stopping, checkpointing, learning rate schedules, and logging without changing the model code.

  • EarlyStopping — stop when val_loss stops improving
  • ModelCheckpoint — save best weights to disk
  • ReduceLROnPlateau — halve LR when plateau detected
  • TensorBoard — log metrics for the TensorBoard UI
  • CSVLogger — write epoch metrics to CSV
  • LambdaCallback — run arbitrary code each epoch
import tensorflow as tf
import numpy as np

# ── Synthetic dataset ─────────────────────────────────────────────────────────
rng = np.random.default_rng(42)
X_train = rng.standard_normal((8000, 20)).astype(np.float32)
y_train = rng.integers(0, 5, 8000)
X_val   = rng.standard_normal((2000, 20)).astype(np.float32)
y_val   = rng.integers(0, 5, 2000)

# ── Build model ───────────────────────────────────────────────────────────────
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(20,)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(5, activation='softmax'),
])

# ── Compile ───────────────────────────────────────────────────────────────────
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy', tf.keras.metrics.AUC(name='auc', multi_label=False)],
)

# ── Callbacks ─────────────────────────────────────────────────────────────────
callbacks = [
    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=10,              # wait 10 epochs for improvement
        restore_best_weights=True,
        min_delta=1e-4,
        verbose=1,
    ),
    tf.keras.callbacks.ModelCheckpoint(
        filepath='best_model.keras',
        monitor='val_accuracy',
        save_best_only=True,
        verbose=0,
    ),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,               # halve the learning rate
        patience=5,
        min_lr=1e-6,
        verbose=1,
    ),
    tf.keras.callbacks.TensorBoard(
        log_dir='./logs',
        histogram_freq=1,
        update_freq='epoch',
    ),
]

# ── Train ─────────────────────────────────────────────────────────────────────
history = model.fit(
    X_train, y_train,
    batch_size=128,
    epochs=100,                   # EarlyStopping will stop before this
    validation_data=(X_val, y_val),
    callbacks=callbacks,
    verbose=1,
)

# ── Evaluate ──────────────────────────────────────────────────────────────────
results = model.evaluate(X_val, y_val, verbose=0)
for name, val in zip(model.metrics_names, results):
    print(f"  {name}: {val:.4f}")

# ── Learning curves from History ─────────────────────────────────────────────
best_epoch = np.argmin(history.history['val_loss'])
print(f"Best epoch: {best_epoch+1}")
print(f"Best val_loss: {history.history['val_loss'][best_epoch]:.4f}")
print(f"Best val_accuracy: {history.history['val_accuracy'][best_epoch]:.4f}")

💾 Saving & Deployment

Always Use SavedModel Over H5 for Production

The SavedModel format saves the full TensorFlow graph, custom objects, and signatures — everything needed to reload and serve the model without the original Python code. H5 format only saves weights and architecture in JSON; it does not preserve custom layers, tf.function traces, or serving signatures correctly.

Format Extension Use Case Portability Notes
SavedModel directory / Production serving, TF Serving, TFX High — no Python needed Default TF2 format; preserves signatures and custom ops
Keras native .keras Checkpointing, sharing models with Keras users Medium — needs Keras 3.x New format from Keras 3; preferred over H5 for Keras work
H5 / HDF5 .h5 Legacy; quick experiments Low — needs Python+Keras Avoid for production; missing custom object support
TFLite .tflite Mobile (Android/iOS), microcontrollers, Edge TPU High — tiny runtime Quantise to int8 for <10ms latency on device
ONNX .onnx Cross-framework interop High — runs on ONNXRuntime Use tf2onnx; great for serving in non-TF environments
TF.js tfjs_model/ Browser-side inference High — runs in JS Convert with tensorflowjs_converter; supports WebGL acceleration
import tensorflow as tf
import numpy as np

# ── Assume 'model' is a fitted tf.keras.Model ─────────────────────────────────
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),
    tf.keras.layers.Dense(5, activation='softmax'),
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

# ── 1. Save in SavedModel format (recommended) ────────────────────────────────
model.save('my_model/')                    # creates a directory
loaded = tf.saved_model.load('my_model/')
# OR via Keras:
loaded_keras = tf.keras.models.load_model('my_model/')

# ── 2. Save in new Keras native format ────────────────────────────────────────
model.save('my_model.keras')               # single file
loaded_k = tf.keras.models.load_model('my_model.keras')

# ── 3. Save weights only (smallest, for resuming training) ────────────────────
model.save_weights('weights.weights.h5')
model.load_weights('weights.weights.h5')

# ── 4. Convert to TFLite for mobile/edge ─────────────────────────────────────
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
    f.write(tflite_model)

# TFLite with int8 quantisation (8x smaller, faster on edge devices)
converter_q = tf.lite.TFLiteConverter.from_keras_model(model)
converter_q.optimizations = [tf.lite.Optimize.DEFAULT]
# Provide representative dataset for full int8 quantisation
def representative_dataset():
    for _ in range(100):
        yield [np.random.randn(1, 20).astype(np.float32)]
converter_q.representative_dataset = representative_dataset
converter_q.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
tflite_q = converter_q.convert()
print(f"Original size: {len(tflite_model)/1024:.1f} KB")
print(f"Quantised size: {len(tflite_q)/1024:.1f} KB")

# ── 5. Run inference with TFLite interpreter ──────────────────────────────────
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
input_details  = interpreter.get_input_details()
output_details = interpreter.get_output_details()

sample = np.random.randn(1, 20).astype(np.float32)
interpreter.set_tensor(input_details[0]['index'], sample)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
print(f"TFLite prediction shape: {output.shape}")   # (1, 5)