Three companion tutorials that build the math of neural-network training from the ground up — no prerequisites past high-school algebra.
Most explanations of backpropagation jump straight to the algorithm and leave the calculus implicit. This series goes the other way: it builds the small pieces of math you need first — slopes, derivatives, the chain rule — and then shows that backpropagation is nothing more than those pieces, applied to the layers of a network.
Each tutorial is self-contained, with interactive D3 figures and worked examples that you can read in 30–45 minutes. By the end of the third one, you will have computed a full backward pass by hand and understood why every formula is the shape it is.
Slope of a straight line, tangent to a curve, the limit definition. From there to the power rule, the exponential, and the sigmoid derivative — the workhorse of every neural-network activation. Closes with partial derivatives and the gradient vector.
read →Why rates of change multiply across a composition of functions. Four worked examples — including the full derivation of σ′(x) as a three-stage chain — plus the multivariate version that backpropagation actually uses: multiply along paths, sum across paths.
read →A six-weight network traced end-to-end: forward pass, loss, output δ, hidden δ, six weight updates, every number shown. Why δ exists, what vanishing gradients are, how bias terms slot in, and the vectorized form that ships in PyTorch.
read →If you already know what a derivative is and what the chain rule does, you can jump straight to Part 3 (Backpropagation). If either of those is rusty, start at the beginning — each part takes the next part as its only prerequisite, and the cross-references are tight.
total reading time about two hours · all three pages use the same vocabulary and notation