Give it a simpler name: Nominative Obscurantism in Machine Learning
Terminology in machine learning is too often complicated-sounding and not sufficiently descriptive
A rose by any other name confuses the issue. - Patrick Raume
What’s common between Logistic Regression, Support Vector Machine, and the Perceptron algorithm? They are all horribly inscrutable terms to a newbie. Worse, they are all horribly inscrutable terms referring to essentially the same problem, namely, the classification problem: that is, find a rule that classifies objects into two (or more) categories.
I teach an applied machine learning class, and every other week, when I introduce a new topic, I talk a bit about how the complicated-sounding name of the topic belies really simple ideas.
If you have blue dots and red dots in n-dimensional space, what is a surface that nicely separates the blue dots from the red dots? Each of the three methods, logistic regression, support vector machine, and the perceptron algorithm, find such a surface by solving an optimization problem. The objective functions minimized by the three methods share some properties and are different in one small but important respect.
I suggest that the names for the three things should (1) tell you what they are and (2) tell you that they are all related, just slightly different. It is not that I object to giving unique idiosyncratic names to new ideas, just when simpler names are possible.
So here’s a candidate revision of their names. Most of them are not original, but this issue just doesn’t seem to be consolidated in a good place.
Logistic regression is “classification using sigmoidal loss”. Don’t use the word logistic, that’s another obscure historical reference to population dynamics, not relevant here. Sigmoidal says something about the shape. Or maybe call it cross entropy loss.
Support vector machine is “classification using separation loss (or margin loss)”. Don’t use the term hinge loss, as the plot of the perceptron loss (below) also looks like a hinge.
Perceptron algorithm is “classification using rectified linear loss”. Don’t call it perceptron loss, as that is not useful.
I agree that these three methods have different properties and interpretations, but it is good to emphasize the commonality before emphasizing the differences. These alternative terms are also a bit obscure to the outsider, but at least they all have the same beginning “classification” and they all have analogous names emphasizing the name of the loss function minimized.
Of course, the problem is not particular to just these three terms. So I encourage people to make long laundry lists of all the machine learning terms — including the above three — and then making an attempt at making them easier to learn and more rational.
Machine learning teaches us that good representations and abstractions help learning, so it would be good to also make terminology used in machine learning useful for learning the subject.