In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.
The lemma has applications in compressed sensing, manifold learning, dimensionality reduction, graph embedding, and natural language processing. Much of the data stored and manipulated on computers, including text and images, can be represented as points in a high-dimensional space (see vector space model for the case of text). However, the essential algorithms for working with such data tend to become bogged down very quickly as dimension increases.[1] It is therefore desirable to reduce the dimensionality of the data in a way that preserves its relevant structure.
The lemma likely explains how large language models (LLMs) like transformers are able to represent highly nuanced word meanings and context in relatively low-dimension embeddings.[2]
Given , a set of points in , and an integer ,[3] there is a linear map such that
for all .
The formula can be rearranged:
Alternatively, for any and any integer [Note 1] there exists a linear function such that the restriction is -bi-Lipschitz.[Note 2]
Also, the lemma is tight up to a constant factor, i.e. there exists a set of points of size m that needs dimension
in order to preserve the distances between all pairs of points within a factor of .[4][5]
The classical proof of the lemma takes to be a scalar multiple of an orthogonal projection onto a random subspace of dimension in . An orthogonal projection collapses some dimensions of the space it is applied to, which reduces the length of all vectors, as well as distance between vectors in the space. Under the conditions of the lemma, concentration of measure ensures there is a nonzero chance that a random orthogonal projection reduces pairwise distances between all points in by roughly a constant factor . Since the chance is nonzero, such projections must exist, so we can choose one and set .
To obtain the projection algorithmically, it suffices with high probability to repeatedly sample orthogonal projection matrices at random. If you keep rolling the dice, you will eventually obtain one in polynomial random time.
It is possible to combine two JL matrices by taking the so-called face-splitting product, which is defined as the tensor products of the rows (was proposed by V. Slyusar[9] in 1996[10][11][12][13][14] for radar and digital antenna array applications).
More directly, let and be two matrices.
Then the face-splitting product is[10][11][12][13][14]
This idea of tensorization was used by Kasiviswanathan et al. for differential privacy.[15]
JL matrices defined like this use fewer random bits, and can be applied quickly to vectors that have tensor structure, due to the following identity:[12]
- ,
where is the element-wise (Hadamard) product.
Such computations have been used to efficiently compute polynomial kernels and many other linear-algebra algorithms[clarification needed].[16]
In 2020[17] it was shown that if the matrices are independent or Gaussian matrices, the combined matrix satisfies the distributional JL lemma if the number of rows is at least
- .
For large this is as good as the completely random Johnson-Lindenstrauss, but
a matching lower bound in the same paper shows that this exponential dependency on is necessary.
Alternative JL constructions are suggested to circumvent this.
Or any integer
This result follows from the above result. Sketch of proof: Note and for all . Do casework for 1=m and 1<m, applying the above result to in the latter case, noting
For instance, writing about nearest neighbor search in high-dimensional data sets, Jon Kleinberg writes: "The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the average case analysis of heuristics such as k-d trees reveal an exponential dependence on d in the query time. Kleinberg, Jon M. (1997), "Two Algorithms for Nearest-neighbor Search in High Dimensions", Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC '97, New York, NY, USA: ACM, pp. 599–608, doi:10.1145/258533.258653, ISBN 0-89791-888-6.
Larsen, Kasper Green; Nelson, Jelani (2017), "Optimality of the Johnson-Lindenstrauss Lemma", Proceedings of the 58th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 633–638, arXiv:1609.02094, doi:10.1109/FOCS.2017.64, S2CID 16745
Johnson, William B.; Lindenstrauss, Joram (1984), "Extensions of Lipschitz mappings into a Hilbert space", in Beals, Richard; Beck, Anatole; Bellow, Alexandra; et al. (eds.), Conference in modern analysis and probability (New Haven, Conn., 1982), Contemporary Mathematics, vol. 26, Providence, RI: American Mathematical Society, pp. 189–206, doi:10.1090/conm/026/737400, ISBN 0-8218-5030-X, MR 0737400, S2CID 117819162
Ailon, Nir; Chazelle, Bernard (2006), "Approximate nearest neighbors and the fast Johnson–Lindenstrauss transform", Proceedings of the 38th Annual ACM Symposium on Theory of Computing, New York: ACM Press, pp. 557–563, doi:10.1145/1132516.1132597, ISBN 1-59593-134-1, MR 2277181, S2CID 490517
Kane, Daniel M.; Nelson, Jelani (2014), "Sparser Johnson-Lindenstrauss Transforms", Journal of the ACM, 61 (1): 1, arXiv:1012.1577, doi:10.1145/2559902, MR 3167920, S2CID 7821848. A preliminary version of this paper was published in the Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, 2012.
Kasiviswanathan, Shiva Prasad; Rudelson, Mark; Smith, Adam D.; Ullman, Jonathan R. (2010), "The price of privately releasing contingency tables and the spectra of random matrices with correlated rows", in Schulman, Leonard J. (ed.), Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5–8 June 2010, Association for Computing Machinery, pp. 775–784, doi:10.1145/1806689.1806795, S2CID 5714334
Ahle, Thomas; Kapralov, Michael; Knudsen, Jakob; Pagh, Rasmus; Velingker, Ameya; Woodruff, David; Zandieh, Amir (2020), "Oblivious Sketching of High-Degree Polynomial Kernels", ACM-SIAM Symposium on Discrete Algorithms, Association for Computing Machinery, arXiv:1909.01410, doi:10.1137/1.9781611975994.9
- Achlioptas, Dimitris (2003), "Database-friendly random projections: Johnson–Lindenstrauss with binary coins", Journal of Computer and System Sciences, 66 (4): 671–687, doi:10.1016/S0022-0000(03)00025-4, MR 2005771. Journal version of a paper previously appearing at PODC 2001.
- Baraniuk, Richard; Davenport, Mark; DeVore, Ronald; Wakin, Michael (2008), "A simple proof of the restricted isometry property for random matrices", Constructive Approximation, 28 (3): 253–263, doi:10.1007/s00365-007-9003-x, hdl:1911/21683, MR 2453366, S2CID 15911073.
- Dasgupta, Sanjoy; Gupta, Anupam (2003), "An elementary proof of a theorem of Johnson and Lindenstrauss" (PDF), Random Structures & Algorithms, 22 (1): 60–65, doi:10.1002/rsa.10073, MR 1943859, S2CID 10327785.
- Landweber, Peter; Lazar, Emanuel A.; Patel, Neel (2016), "On fiber diameters of continuous maps", American Mathematical Monthly, 123 (4): 392–397, arXiv:1503.07597, doi:10.4169/amer.math.monthly.123.4.392, S2CID 51751732
- Slyusar, V. I. (1997-05-20), "Analytical model of the digital antenna array on a basis of face-splitting matrix products." (PDF), Proc. ICATT-97, Kyiv: 108–109
- Slyusar, V. I. (March 13, 1998), "A Family of Face Products of Matrices and its Properties" (PDF), Cybernetics and Systems Analysis C/C of Kibernetika I Sistemnyi Analiz.- 1999., 35 (3): 379–384, doi:10.1007/BF02733426, S2CID 119661450.
- The Modern Algorithmic Toolbox Lecture #4: Dimensionality Reduction (PDF), 2023