⚡ TensorFlow vs Keras
TF1 → TF2: A Paradigm Shift
TensorFlow 1.x required building a static computation graph, then feeding data into a separate Session to execute it. Debugging was painful — you couldn't inspect intermediate values without explicit print ops. TF2 (released 2019) switched to eager execution: operations run immediately like regular Python code.
- TF1: define graph → create Session → sess.run(ops) → inspect
- TF2: operations execute immediately, results are Python values
tf.function— optional: trace to graph for speed- TF2 is the default and the only version actively developed
- Migrate TF1 code:
tf.compat.v1compatibility layer
Keras as the High-Level API
Keras was originally a standalone library. Since TF2, it ships as tf.keras and is the official high-level API. Standalone Keras (keras 3.x) now also supports JAX and PyTorch backends. For TF-only work, tf.keras is what you use.
tf.keras— bundled with TensorFlow, recommended- Keras 3.x — multi-backend (TF, JAX, PyTorch)
- The Model/Layer/Callback API is identical between both
- tf.data for efficient input pipelines
- tf.GradientTape for custom training loops
tf.Tensor vs NumPy
TensorFlow tensors live on a specific device (CPU/GPU/TPU) and are immutable. Most NumPy operations work on tensors via the numpy interop layer, and tensors can be converted to NumPy arrays when on CPU.
tf.constant([1,2,3])— immutable tensortf.Variable([1,2,3])— mutable, holds model weightst.numpy()— convert to NumPy (CPU only)tf.cast(t, tf.float32)— change dtype- Automatic device placement: tf chooses GPU if available
import tensorflow as tf
import numpy as np
# ── Verify installation ────────────────────────────────────────────────────────
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {tf.keras.__version__}")
print(f"GPU available: {len(tf.config.list_physical_devices('GPU')) > 0}")
# ── Hello-world tensor operations ─────────────────────────────────────────────
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[5.0, 6.0], [7.0, 8.0]])
print(a + b) # element-wise add
print(a @ b) # matrix multiply
print(tf.reduce_sum(a)) # sum all elements → 10.0
print(tf.reduce_mean(a, axis=0)) # column means → [2.0, 3.0]
# ── tf.Variable: mutable state for weights ────────────────────────────────────
W = tf.Variable(tf.random.normal([4, 3], stddev=0.1))
b_var = tf.Variable(tf.zeros([3]))
x = tf.constant(tf.random.normal([10, 4])) # batch of 10
logits = x @ W + b_var
print(f"Logits shape: {logits.shape}") # (10, 3)
# ── tf.GradientTape: manual gradient computation ──────────────────────────────
x_in = tf.Variable(3.0)
with tf.GradientTape() as tape:
y = x_in ** 2 + 2 * x_in + 1 # y = (x+1)^2, dy/dx = 2x+2
grad = tape.gradient(y, x_in)
print(f"dy/dx at x=3: {grad.numpy()}") # 8.0
# ── Interop with NumPy ────────────────────────────────────────────────────────
np_arr = np.array([1.0, 2.0, 3.0], dtype=np.float32)
tf_tensor = tf.constant(np_arr) # NumPy → TF
back_to_np = tf_tensor.numpy() # TF → NumPy
print(type(back_to_np), back_to_np) # numpy.ndarray [1. 2. 3.]
🏗 Building Models with Keras
Keras provides three model-building APIs that trade simplicity for flexibility. All produce identical tf.keras.Model objects with the same compile/fit/evaluate/predict interface.
Sequential API
Stack layers linearly. Simplest approach — ideal for most feedforward networks. Cannot express branching, shared layers, or multiple inputs/outputs.
tf.keras.Sequential([layer1, layer2, ...])model.add(layer)— add layers incrementally- Layers are named automatically (dense, dense_1, ...)
- Limitations: single input, single output, strictly linear
Functional API
Define models as a graph of layers by calling layers as functions on tensors. Enables residual connections, multi-input/output models, and shared layers. Preferred for most production models.
inputs = tf.keras.Input(shape=(...))- Call layers:
x = Dense(64)(inputs) model = tf.keras.Model(inputs=inputs, outputs=outputs)- Explicit data flow — model graph is inspectable
Model Subclassing
Subclass tf.keras.Model and define __init__ (layers) and call (forward pass). Maximum flexibility — any Python logic in the forward pass, dynamic architectures, custom loops.
- Override
call(self, inputs, training=False) - Use
trainingflag to toggle Dropout/BN behaviour - Less visible to model.summary() for dynamic shapes
- Required for: dynamic graphs, custom attention, research
import tensorflow as tf
import numpy as np
# ── Same MLP (2 hidden layers) in all three APIs ──────────────────────────────
INPUT_DIM = 20
HIDDEN = 64
CLASSES = 5
# ── 1. Sequential API ────────────────────────────────────────────────────────
model_seq = tf.keras.Sequential([
tf.keras.layers.Input(shape=(INPUT_DIM,)),
tf.keras.layers.Dense(HIDDEN, activation='relu'),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(HIDDEN // 2, activation='relu'),
tf.keras.layers.Dense(CLASSES, activation='softmax'),
], name='sequential_mlp')
# ── 2. Functional API ────────────────────────────────────────────────────────
inputs = tf.keras.Input(shape=(INPUT_DIM,), name='features')
x = tf.keras.layers.Dense(HIDDEN, activation='relu')(inputs)
x = tf.keras.layers.Dropout(0.3)(x)
x = tf.keras.layers.Dense(HIDDEN // 2, activation='relu')(x)
outputs = tf.keras.layers.Dense(CLASSES, activation='softmax', name='predictions')(x)
model_func = tf.keras.Model(inputs=inputs, outputs=outputs, name='functional_mlp')
# ── 3. Model Subclassing ─────────────────────────────────────────────────────
class SubclassedMLP(tf.keras.Model):
def __init__(self, hidden, classes):
super().__init__(name='subclassed_mlp')
self.dense1 = tf.keras.layers.Dense(hidden, activation='relu')
self.dropout = tf.keras.layers.Dropout(0.3)
self.dense2 = tf.keras.layers.Dense(hidden // 2, activation='relu')
self.out = tf.keras.layers.Dense(classes, activation='softmax')
def call(self, inputs, training=False):
x = self.dense1(inputs)
x = self.dropout(x, training=training) # only active during training
x = self.dense2(x)
return self.out(x)
model_sub = SubclassedMLP(HIDDEN, CLASSES)
# All three are tf.keras.Model — same interface:
for name, model in [('Sequential', model_seq),
('Functional', model_func),
('Subclassed', model_sub)]:
dummy_input = tf.random.normal((8, INPUT_DIM))
out = model(dummy_input, training=False)
print(f"{name}: output shape = {out.shape}")
model_func.summary()
💧 Layers, Activations & Regularisation
Key Layers
Dense(units, activation)— fully connected layerConv2D(filters, kernel_size, strides, padding)— spatial featuresMaxPooling2D(pool_size)— spatial downsamplingGlobalAveragePooling2D()— collapse spatial dimsLSTM(units, return_sequences)— recurrent sequenceGRU(units)— gated recurrent, faster than LSTMEmbedding(vocab_size, embed_dim)— token embeddingsMultiHeadAttention(num_heads, key_dim)— transformer attentionFlatten()/Reshape(target_shape)
Activations
relu— most common, fast; dead neuron problemleaky_relu/elu— fix dead neuronsgelu— Gaussian error linear unit; used in transformersselu— self-normalising; use with lecun_normal initsigmoid— binary output / gates; vanishing gradientsoftmax— multi-class probability outputtanh— RNN hidden states; zero-centredswish— gated activation; often better than relu
Regularisation Techniques
Dropout(rate)— randomly zero activations; prevents co-adaptationSpatialDropout2D(rate)— drop entire feature mapsBatchNormalization()— normalise per batch; accelerates trainingLayerNormalization()— normalise per sample; better for seq modelskernel_regularizer=tf.keras.regularizers.L2(1e-4)— weight decaykernel_regularizer=tf.keras.regularizers.L1(1e-4)— sparsity- Gradient clipping:
optimizer=Adam(clipnorm=1.0)
import tensorflow as tf
# ── CNN for Image Classification (CIFAR-10 style) ─────────────────────────────
def build_cnn(input_shape=(32, 32, 3), num_classes=10):
inputs = tf.keras.Input(shape=input_shape, name='image')
# Block 1
x = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu',
kernel_regularizer=tf.keras.regularizers.L2(1e-4))(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(32, (3,3), padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D((2,2))(x)
x = tf.keras.layers.SpatialDropout2D(0.2)(x)
# Block 2
x = tf.keras.layers.Conv2D(64, (3,3), padding='same', activation='relu',
kernel_regularizer=tf.keras.regularizers.L2(1e-4))(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(64, (3,3), padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D((2,2))(x)
x = tf.keras.layers.SpatialDropout2D(0.3)(x)
# Block 3
x = tf.keras.layers.Conv2D(128, (3,3), padding='same', activation='relu')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x) # (batch, 128)
# Classifier head
x = tf.keras.layers.Dense(256, activation='relu',
kernel_regularizer=tf.keras.regularizers.L2(1e-4))(x)
x = tf.keras.layers.Dropout(0.5)(x)
outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
return tf.keras.Model(inputs, outputs, name='cnn_classifier')
cnn = build_cnn()
cnn.summary()
# Count parameters
total_params = cnn.count_params()
print(f"Total parameters: {total_params:,}")
# Verify forward pass
dummy_batch = tf.random.normal((8, 32, 32, 3))
out = cnn(dummy_batch, training=False)
print(f"Output shape: {out.shape}") # (8, 10)
🔥 Compiling & Training
model.compile()
Compilation configures the training procedure. Keras supports string shortcuts for common optimisers/losses, or you can pass class instances to customise hyperparameters.
- Optimizers:
adam,sgd,rmsprop,adamw - Losses:
sparse_categorical_crossentropy(int labels) - Losses:
categorical_crossentropy(one-hot labels) - Losses:
binary_crossentropy,mse,mae - Metrics:
accuracy,AUC,Precision,Recall run_eagerly=True— disable tf.function for debugging
model.fit()
Runs the training loop. Keras handles batching, shuffling, validation splitting, and metric tracking automatically. Returns a History object for plotting learning curves.
batch_size— samples per gradient update (32-512 typical)epochs— number of full passes through training datavalidation_split=0.2— last 20% of data as validationvalidation_data=(X_val, y_val)— explicit validation setshuffle=True— shuffle training data each epochclass_weight={0: 1, 1: 10}— handle class imbalance
Callbacks
Callbacks are hooks that run at specific points during training. They enable early stopping, checkpointing, learning rate schedules, and logging without changing the model code.
EarlyStopping— stop when val_loss stops improvingModelCheckpoint— save best weights to diskReduceLROnPlateau— halve LR when plateau detectedTensorBoard— log metrics for the TensorBoard UICSVLogger— write epoch metrics to CSVLambdaCallback— run arbitrary code each epoch
import tensorflow as tf
import numpy as np
# ── Synthetic dataset ─────────────────────────────────────────────────────────
rng = np.random.default_rng(42)
X_train = rng.standard_normal((8000, 20)).astype(np.float32)
y_train = rng.integers(0, 5, 8000)
X_val = rng.standard_normal((2000, 20)).astype(np.float32)
y_val = rng.integers(0, 5, 2000)
# ── Build model ───────────────────────────────────────────────────────────────
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(20,)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(5, activation='softmax'),
])
# ── Compile ───────────────────────────────────────────────────────────────────
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
loss='sparse_categorical_crossentropy',
metrics=['accuracy', tf.keras.metrics.AUC(name='auc', multi_label=False)],
)
# ── Callbacks ─────────────────────────────────────────────────────────────────
callbacks = [
tf.keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=10, # wait 10 epochs for improvement
restore_best_weights=True,
min_delta=1e-4,
verbose=1,
),
tf.keras.callbacks.ModelCheckpoint(
filepath='best_model.keras',
monitor='val_accuracy',
save_best_only=True,
verbose=0,
),
tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5, # halve the learning rate
patience=5,
min_lr=1e-6,
verbose=1,
),
tf.keras.callbacks.TensorBoard(
log_dir='./logs',
histogram_freq=1,
update_freq='epoch',
),
]
# ── Train ─────────────────────────────────────────────────────────────────────
history = model.fit(
X_train, y_train,
batch_size=128,
epochs=100, # EarlyStopping will stop before this
validation_data=(X_val, y_val),
callbacks=callbacks,
verbose=1,
)
# ── Evaluate ──────────────────────────────────────────────────────────────────
results = model.evaluate(X_val, y_val, verbose=0)
for name, val in zip(model.metrics_names, results):
print(f" {name}: {val:.4f}")
# ── Learning curves from History ─────────────────────────────────────────────
best_epoch = np.argmin(history.history['val_loss'])
print(f"Best epoch: {best_epoch+1}")
print(f"Best val_loss: {history.history['val_loss'][best_epoch]:.4f}")
print(f"Best val_accuracy: {history.history['val_accuracy'][best_epoch]:.4f}")
💾 Saving & Deployment
Always Use SavedModel Over H5 for Production
The SavedModel format saves the full TensorFlow graph, custom objects, and signatures — everything needed to reload and serve the model without the original Python code. H5 format only saves weights and architecture in JSON; it does not preserve custom layers, tf.function traces, or serving signatures correctly.
| Format | Extension | Use Case | Portability | Notes |
|---|---|---|---|---|
| SavedModel | directory / | Production serving, TF Serving, TFX | High — no Python needed | Default TF2 format; preserves signatures and custom ops |
| Keras native | .keras | Checkpointing, sharing models with Keras users | Medium — needs Keras 3.x | New format from Keras 3; preferred over H5 for Keras work |
| H5 / HDF5 | .h5 | Legacy; quick experiments | Low — needs Python+Keras | Avoid for production; missing custom object support |
| TFLite | .tflite | Mobile (Android/iOS), microcontrollers, Edge TPU | High — tiny runtime | Quantise to int8 for <10ms latency on device |
| ONNX | .onnx | Cross-framework interop | High — runs on ONNXRuntime | Use tf2onnx; great for serving in non-TF environments |
| TF.js | tfjs_model/ | Browser-side inference | High — runs in JS | Convert with tensorflowjs_converter; supports WebGL acceleration |
import tensorflow as tf
import numpy as np
# ── Assume 'model' is a fitted tf.keras.Model ─────────────────────────────────
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),
tf.keras.layers.Dense(5, activation='softmax'),
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
# ── 1. Save in SavedModel format (recommended) ────────────────────────────────
model.save('my_model/') # creates a directory
loaded = tf.saved_model.load('my_model/')
# OR via Keras:
loaded_keras = tf.keras.models.load_model('my_model/')
# ── 2. Save in new Keras native format ────────────────────────────────────────
model.save('my_model.keras') # single file
loaded_k = tf.keras.models.load_model('my_model.keras')
# ── 3. Save weights only (smallest, for resuming training) ────────────────────
model.save_weights('weights.weights.h5')
model.load_weights('weights.weights.h5')
# ── 4. Convert to TFLite for mobile/edge ─────────────────────────────────────
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
# TFLite with int8 quantisation (8x smaller, faster on edge devices)
converter_q = tf.lite.TFLiteConverter.from_keras_model(model)
converter_q.optimizations = [tf.lite.Optimize.DEFAULT]
# Provide representative dataset for full int8 quantisation
def representative_dataset():
for _ in range(100):
yield [np.random.randn(1, 20).astype(np.float32)]
converter_q.representative_dataset = representative_dataset
converter_q.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
tflite_q = converter_q.convert()
print(f"Original size: {len(tflite_model)/1024:.1f} KB")
print(f"Quantised size: {len(tflite_q)/1024:.1f} KB")
# ── 5. Run inference with TFLite interpreter ──────────────────────────────────
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
sample = np.random.randn(1, 20).astype(np.float32)
interpreter.set_tensor(input_details[0]['index'], sample)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
print(f"TFLite prediction shape: {output.shape}") # (1, 5)