writing

Blog

Thoughts on AI, deep learning, and engineering automation.

Unaligned Multi-Modal Transformers

Most vision-language models assume tight alignment between image patches and text tokens. What happens when we relax that constraint? A look at unaligned multi-modal architectures and why they matter.

TransformersVisionResearch

Einstein Summation (einsum) for Multi-Dimensional Arrays

einsum notation lets you write complex tensor contractions — matrix multiplications, batch operations, traces — in one readable line. Here is how to master it.

PythonNumPyLinear Algebra

Self-Attention in Transformers Explained

A visual walkthrough of the self-attention mechanism — the core building block of Transformer models — with step-by-step implementation in PyTorch.

Deep LearningTransformersPyTorch