Skip to content
Mindware Research Institute

Mindware Research Institute

Concept Research – AI powered Creative Information Analysis

  • Home
  • Concept Research
  • Contact
  • 日本語

Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way

2025年11月30日
By Kunihiro TADA In Data Science

Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way

The pioneering research that shaped today’s “data science” can be traced back to pattern recognition studies of the 1960s. In Japan, that era saw the development of machines capable of reading handwritten postal codes to sort mail at high speed. One of the intellectual leaders of that formative period, Satoshi Watanabe, proposed the remarkably evocative concept known as the “Ugly Duckling Theorem.”

Despite its importance, this theorem is seldom taught in modern data science programs, and many data scientists—especially those trained within frameworks dominated by machine learning engineering—have never seriously engaged with its philosophical implications. Yet, once examined carefully, the theorem reveals profound connections to Buddhist philosophy, Kantian epistemology, Husserl’s phenomenology, and Jakob von Uexküll’s concept of Umwelt (environmental world).

This essay explores those connections and argues that a proper understanding of clustering requires the same cognitive attitude Buddhism describes as the Middle Way between emptiness and conventional appearance.


■ What Is the Ugly Duckling Theorem?

Watanabe famously stated:

If we measure similarity by the number of predicates they share, then any two objects possess the same degree of similarity.

This sounds counterintuitive at first, but the logic is straightforward:

  • The world contains an unlimited number of possible attributes.
  • Humans select only a tiny subset when comparing objects.
  • Any two objects share some attributes and differ in others.
  • Without assigning importance (weights) to attributes, all pairs of objects are equally similar.

Therefore, similarity is not an inherent property of the data.
It is a consequence of which attributes we choose to value.

Watanabe concluded:

“To escape the Ugly Duckling Theorem, one must admit that some attributes are more important than others.”
“The goal of clustering is not to discover ‘objective’ classes but to create useful new classifications.”

However, many practitioners overlook this foundational point. They assume, often unconsciously, that:

  • A “true” structure exists inside the data.
  • A clustering algorithm can reveal it.
  • The result should coincide with everyday categories.
  • The clustering with the highest quality metric (e.g., silhouette score) must be the “correct” one.

Clustering metrics are indeed helpful, but treating them as arbiters of objective reality is naïve and even dangerous. They are evaluative tools—not detectors of ontological truth.


■ Philosophical Depth: Emptiness, the Thing-in-Itself, Phenomenology, and Umwelt

The Ugly Duckling Theorem resonates with several major philosophical traditions.

● Buddhism’s Doctrine of Emptiness (Śūnyatā)

  • No object possesses inherent essence or fixed identity.
  • Categories, meanings, and distinctions arise only through human conceptual activity.
  • Classification is therefore empty: it does not mirror an objective essence.

● Kant’s “Thing-in-Itself”

  • Humans can only perceive the world through the structure of human sensibility (space, time, categories).
  • The world “as it is” has no color, smell, value, or meaning independent of perception.

● Husserl’s Phenomenology and “Epoché”

  • Suspend judgments and preconceived categories.
  • Attend to phenomena as they appear, without the distortion of habitual interpretation.

● Uexküll’s Umwelt

  • Every organism inhabits a world structured by its sensory and bodily constitution.
  • Each species lives in its own “meaningful world.”

Humans, too, live within a human-shaped Umwelt.
Classification cannot be objective because the world we see is already filtered through human embodiment and cognition.


■ The Buddhist Framework of “Three Truths”: Emptiness, Provisionality, and the Middle

The Tiantai (Tendai) philosopher Zhiyi articulated a threefold truth that maps surprisingly well onto the epistemology of data analysis:

1. Emptiness (kū-tai):

All phenomena lack inherent essence; classifications do not exist “in themselves.”

In data terms:
The raw data contains no labels, no clusters, no “natural kinds.”

2. Provisional Existence (ke-tai):

Phenomena appear as meaningful, labeled, and structured from a human standpoint.

In data terms:
Human goals, tasks, and interpretations create useful classifications for reasoning, communication, and decision-making.

Clustering belongs entirely to this realm of provisionality.

3. The Middle (chū-tai):

A non-dual perspective that embraces both emptiness and provisionality without clinging to either.

This corresponds to the phenomenological epoché—neither assuming inherent structure (emptiness), nor denying the usefulness of structure (provisionality).

Applied to data science, this is the attitude we must cultivate when working with clustering.


■ The Practical Analogy: Drawing, Pattern Perception, and the Artist’s Eye

Epoché is notoriously difficult to practice.
A helpful analogy comes from the art of drawing.

  • Novice artists draw “a nose” or “an apple,” guided by conceptual preconceptions.
  • Skilled artists temporarily set aside the object’s identity and observe pure shapes, values, proportions, edges.

This is an act of suspending conceptual thought—an epoché.
In drawing, auxiliary lines help reveal structure:

  • These lines do not exist in reality.
  • Yet drawing them aids understanding and skillful depiction.

Clustering serves the same function:

A cluster is not a “real boundary” in the data.
It is an auxiliary line that helps us understand patterns.

Mistaking auxiliary lines for objective boundaries is the central error the Ugly Duckling Theorem warns against.


■ Conclusion: The Middle Way Is the Secret to Using Clustering Wisely

Clustering is powerful precisely because:

  • It does not discover immutable categories.
  • It creates classifications that are useful for a purpose.
  • It reflects the analyst’s decisions, values, and chosen attributes.

Thus the true mastery of clustering lies in:

  • Emptiness: Recognizing that classifications are not inherently real.
  • Provisionality: Using classifications as tools for understanding and action.
  • The Middle: Maintaining flexibility, detachment, and methodological humility.

This Middle Way is not only a Buddhist teaching but also the cognitive stance required for sophisticated data analysis.

The Ugly Duckling Theorem invites us to see the limits of our models.
The Buddhist three truths teach us how to work productively within those limits.
Together, they form a philosophical foundation for a wiser, more reflective data science.

Written by:

Kunihiro TADA

He has been a watcher of the industrial boom from the early 1980s to the present day. 1982, planner of high-tech seminars at the Japan Technology and Economy Centre, and of seminars and research projects at JMA Consulting; in 1986 he organised AI chip seminars on fuzzy inference and other topics, triggering the fuzzy boom; after freelance writing on CG and multimedia, he founded the Mindware Research Institute, selling the Japanese version of Viscovery SOMine since 2000, and Hugin and XLSTAT since 2003 in Japan.

View All Posts

Search

Recent Posts

  • Entered into AI governance-related business
  • A Unified Perspective on Cosmology, Causal Structure, Many-Worlds Interpretation, and Bayesian Networks
  • Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way
  • The Value of Human–AI Interfaces in the Age of AGI
  • Viscovery SOMine 8.1 Release
  • Semantic data mining that fundamentally changes information analysis 2
  • Semantic data mining that fundamentally changes information analysis 1
  • SOM as a platform for ensembles of multi-machine learning models
  • Innovation Maps: IT Industry top 1000 Services and Products Competing Map
  • UMAP-SOM: A cutting-edge technique for enabling ultra-multidimensional data mining

Archives

  • December 2025
  • November 2025
  • October 2025
  • January 2025
  • December 2024
  • July 2024
  • June 2024
  • April 2024
  • March 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
RSS Error: Retrieved unsupported status code "404"
Logo  
Daiichi Central Bldg. 6-36, Honmachi, Okayama Kita-ku, 700-0901, Japan
info@mindware-jp.com
+81-86-226-0028

Recent Posts

  • Entered into AI governance-related business
  • A Unified Perspective on Cosmology, Causal Structure, Many-Worlds Interpretation, and Bayesian Networks
  • Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way
  • The Value of Human–AI Interfaces in the Age of AGI
  • Viscovery SOMine 8.1 Release

Categories

  • Data Science
  • Innovation Maps
  • Quantitative business strategy management
  • 未分類

Proudly powered by WordPress | Theme: BusiCare by SpiceThemes