7.4 Orthogonal projections
Linear transformations are essentially manipulations of data, revealing other (hopefully more useful) representations. Intuitively, we think about them as one-to-one mappings, faithfully preserving all the “information” from the input.
This is often not the case, to such an extent that sometimes a lossy compression of the data is highly beneficial. To give you a concrete example, consider a dataset with a million features, out of which only a couple hundred are useful. What we can do is identify the important features and throw away the rest, obtaining a representation that is more compact, thus easier to work with.
This notion is formalized by the concept of orthogonal projections. We already met them upon our first encounter with the inner products (see (2.7)).
Projections also play a fundamental role in the Gram-Schmidt process (Theorem 13), used to orthogonalize an arbitrary basis. Because we are already somewhat familiar with orthogonal...