Skip to content
Mindware Research Institute

Mindware Research Institute

Concept Research – AI powered Creative Information Analysis

  • Home
  • Concept Research
  • Contact
  • 日本語

Classification and Clustering

2023年10月10日
By Kunihiro TADA In Data Science

Classification and Clustering

Classification and clustering are often confused because they are similar. In machine learning, it is generally explained that classification is supervised learning, and clustering is unsupervised learning. In statistical terms, supervised is when there is an objective variable, and unsupervised is when there is no objective variable.

The purpose of a classification model is relatively easy to understand. Classification models use other qualitative and quantitative variables to explain and reproduce the value of a particular qualitative variable. Note that multivariate data can include multiple quantitative and qualitative variables. Formally, any qualitative variable can be chosen as the objective variable for classification. There are as many ways to classify as there are qualitative variables.

On the other hand, the purpose of clustering is a bit lofty, so misunderstandings sometimes occur. It is possible to treat the results of clustering as a new classification. Each cluster generated becomes a class in the new classification. In other words, clustering can be used to give new labels to data that do not have classification labels. But trying to do the opposite, to expect the clustering results to match a particular classification, doesn’t seem very productive.

Of course, if you cluster a dataset that has been prepared for a specific classification and has already been validated, it is quite possible that the clustering results will match the classification. For example, if you cluster the explanatory variables of Fisher’s iris data, the results will closely match the variables of interest. some cases that do not match are due to inconsistencies in the original data. However, this doesn’t seem very practical.

Cases like this are often introduced as examples of clustering, and the purpose of clustering may be becoming increasingly difficult to understand. An easy-to-understand example where the results of classification and clustering do not match is the classification of good and defective products in quality control. In this case, the clustering results are not limited to two classes, and defective products of multiple classes may be found. In other words, we can see that there are multiple patterns of defective products. The purpose of clustering is to discover deeper new knowledge in this way.

Clustering can be interpreted as a method of creating new classifications synthesized from multiple variables. It is generally said that clustering is used when there is no prior knowledge about the object, but personally I only half agree with this idea. This is because some insight must be at work in deciding which variables to use for clustering. Even in clustering, it is always clustering from some perspective, and absolute clustering cannot exist. In practice, the clustering process should be exploratory, adjusting variable selection.

Written by:

Kunihiro TADA

He has been a watcher of the industrial boom from the early 1980s to the present day. 1982, planner of high-tech seminars at the Japan Technology and Economy Centre, and of seminars and research projects at JMA Consulting; in 1986 he organised AI chip seminars on fuzzy inference and other topics, triggering the fuzzy boom; after freelance writing on CG and multimedia, he founded the Mindware Research Institute, selling the Japanese version of Viscovery SOMine since 2000, and Hugin and XLSTAT since 2003 in Japan.

View All Posts

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Recent Posts

  • Epistemology vs Ontology: Why This Distinction Matters More Than Ever
  • Entered into AI governance-related business
  • A Unified Perspective on Cosmology, Causal Structure, Many-Worlds Interpretation, and Bayesian Networks
  • Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way
  • The Value of Human–AI Interfaces in the Age of AGI
  • Viscovery SOMine 8.1 Release
  • Semantic data mining that fundamentally changes information analysis 2
  • Semantic data mining that fundamentally changes information analysis 1
  • SOM as a platform for ensembles of multi-machine learning models
  • Innovation Maps: IT Industry top 1000 Services and Products Competing Map

Archives

  • April 2026
  • December 2025
  • November 2025
  • October 2025
  • January 2025
  • December 2024
  • July 2024
  • June 2024
  • April 2024
  • March 2024
  • December 2023
  • October 2023
  • September 2023
  • August 2023
RSS Error: Retrieved unsupported status code "404"
Logo  
Daiichi Central Bldg. 6-36, Honmachi, Okayama Kita-ku, 700-0901, Japan
info@mindware-jp.com
+81-86-226-0028

Recent Posts

  • Epistemology vs Ontology: Why This Distinction Matters More Than Ever
  • Entered into AI governance-related business
  • A Unified Perspective on Cosmology, Causal Structure, Many-Worlds Interpretation, and Bayesian Networks
  • Data Science and Buddhism: From the “Ugly Duckling Theorem” to Emptiness, Provisionality, and the Middle Way
  • The Value of Human–AI Interfaces in the Age of AGI

Categories

  • Data Science
  • Innovation Maps
  • Quantitative business strategy management
  • ThinkNavi
  • 未分類

Proudly powered by WordPress | Theme: BusiCare by SpiceThemes