Skip to content
Mindware Research Institute

Mindware Research Institute

Concept Research – AI powered Creative Information Analysis

  • Home
  • Concept Research
  • Contact
  • 日本語

Concentration on the sphere – The curse of dimensionality

2023年10月7日
By Kunihiro TADA In Data Science

Concentration on the sphere – The curse of dimensionality

When people are going to do something, it is very important that they know the limits of that thing ahead of time. Machine learning methods may not work well with hyper-multidimensional data, which is referred to as the curse of dimensionality. Ultimately, this is the same situation as the ugly duckling theorem described earlier. Simply put, such data is ‘meaningless’ data.

From a philosophical perspective, this is a limitation of human existence, rather than machine learning. The possible knowledge for humans is always only knowledge from some ‘point of view’. Therefore, when people want to know things in depth, they need to change their point of view and analyse them, which involves discarding information. If one tries to deal with all information at once, it becomes meaningless input. (The all-encompassing universe is a huge meaningless thing, and we create meaning by using only a small part of it. ) Therefore, it is impossible to develop technology to overcome this, as this is a cosmic principle.

The curse of dimensionality is mathematically described as Concentration on the sphere. Concentration on the sphere is that, with any given data point as centre, the distances of the other data points from the centre become approximately equal as the dimension increases. This means that the difference in distance between data points is no longer significant using any of the pairs. Importantly, the centre in here is each data point, not the origin or the centre of gravity of the data space.

Ultimately, if the distances between all data points are equal, clustering, for example, becomes impossible. Actual data does not reach this level, but hyper-multidimensional data comes as close as possible to this state of affairs.

If we think about it in terms of figures, a figure in two-dimensional space where the distances between each points are equal is a regular triangle, and in three-dimensional space it is a regular tetrahedron. I don’t understand exact mathematics, and don’t know what to call such a figure, but if we were to call it an equidistant figure, I guess that an equidistant figure in d dimensions is a regular d+1 equidistant figure. Although we are using a spherical surface to explain how distances approach equidistant, we should not forget that in reality, space expands explosively as the dimension increases. When the number of dimensions approaches the number of data points, the data points are sparsely scattered over a huge space.

Typical misconception of “Concentration on the sphere ” may be the claim that hyper-multidimensional data has a spherical topology. You may come across papers that say that spherical SOM eliminates the curse of dimensionality, but I think, this is completely wrong and is a kind of pseudo-science. The sphere in Concentration on the sphere means approximately equidistant between data points, not a three-dimensional sphere as we know.

We can see constellations in the night sky. However, this is how the stars are arranged as seen from Earth, and the constellations seen from another star will be different from those seen from Earth.

Written by:

Kunihiro TADA

He has been a watcher of the industrial boom from the early 1980s to the present day. 1982, planner of high-tech seminars at the Japan Technology and Economy Centre, and of seminars and research projects at JMA Consulting; in 1986 he organised AI chip seminars on fuzzy inference and other topics, triggering the fuzzy boom; after freelance writing on CG and multimedia, he founded the Mindware Research Institute, selling the Japanese version of Viscovery SOMine since 2000, and Hugin and XLSTAT since 2003 in Japan.

View All Posts

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Recent Posts

  • Epistemology vs Ontology: Why This Distinction Matters More Than Ever
  • Entered into AI governance-related business
  • A Unified Perspective on Cosmology, Causal Structure, Many-Worlds Interpretation, and Bayesian Networks
  • Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way
  • The Value of Human–AI Interfaces in the Age of AGI
  • Viscovery SOMine 8.1 Release
  • Semantic data mining that fundamentally changes information analysis 2
  • Semantic data mining that fundamentally changes information analysis 1
  • SOM as a platform for ensembles of multi-machine learning models
  • Innovation Maps: IT Industry top 1000 Services and Products Competing Map

Archives

  • April 2026
  • December 2025
  • November 2025
  • October 2025
  • January 2025
  • December 2024
  • July 2024
  • June 2024
  • April 2024
  • March 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
RSS Error: Retrieved unsupported status code "404"
Logo  
Daiichi Central Bldg. 6-36, Honmachi, Okayama Kita-ku, 700-0901, Japan
info@mindware-jp.com
+81-86-226-0028

Recent Posts

  • Epistemology vs Ontology: Why This Distinction Matters More Than Ever
  • Entered into AI governance-related business
  • A Unified Perspective on Cosmology, Causal Structure, Many-Worlds Interpretation, and Bayesian Networks
  • Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way
  • The Value of Human–AI Interfaces in the Age of AGI

Categories

  • Data Science
  • Innovation Maps
  • Quantitative business strategy management
  • ThinkNavi
  • 未分類

Proudly powered by WordPress | Theme: BusiCare by SpiceThemes