# Template talk:Machine learning bar

## Contents

## "Models"[edit]

This section title and contents seem pretty much *random* to me. How are contents chosen? One regression, one random clustering algorithm, 4 standard classificators; but no decision tree; which is probably the grandfather of all classificators. --Chire (talk) 12:41, 22 October 2013 (UTC)

In general, one may argue that k-means is NOT machine learning, but plain old statistics. And clustering is at most a step child of the machine learning world; it's a data mining / knowledge discovery domain, just like outlier detection and freuqent itemset mining. If you look at the communities, I would not call data mining part of machine learning either; it's living in parallel (unfortunately). Machine learners don't *get* or *like* unsupervised methods, actually. The "theory" section in this template is also pretty random, isn't it? --Chire (talk) 12:45, 22 October 2013 (UTC)

- This template is brand new and very incomplete. You're welcome to add it.
*k*-means clustering is a very widely employed method in the machine learning community, e.g. by computer vision folks who use it as a feature learning method, by neural nets folks for booststrapping their RBF networks and by text mining people. New papers employing or improving k-means appear regularly in the ML literature. I can dig up some references if you like. - AFAIC, the sidebar can be renamed something like "Data mining/machine learning/pattern recognition" -- the three overlap to such a degree that they're impossible to demarcate. QVVERTYVS (hm?) 14:59, 22 October 2013 (UTC)

- Re: "one regression algorithm": wrong. Logistic regression is in fact a classification algorithm. It is very popular in esp. the natural language processing community and form the basis for much recent neural nets and structured prediction work. Neural nets, k-NN and SVMs are all used for regression, though, even if this is not reflected in their Wikipedia articles. QVVERTYVS (hm?) 15:04, 22 October 2013 (UTC)

- I agree that they are hard to separate and it thus may be a good idea to merge them into one template. I know that k-means is used a lot in machine learning, as it is a statistical optimization problem; not so much actually a structure discovery thing. Maybe instead of the "Models" block, make one for each "Problem" above then? I.e. regression, classification, clustering, anomaly detection, etc.? --Chire (talk) 09:05, 23 October 2013 (UTC)

## Maybe we need to add Markovian models?[edit]

Hidden Markov Models (HMM) has successfully been used (there are dozens are articles, just do a Google scholar search) where HMM have been used for NLP amongst other Machine learning tasks. I believe it should be added as one of the models. — Preceding unsigned comment added by 150.135.223.128 (talk) 19:12, 28 January 2014 (UTC)

- I've added CRFs, HMMs and a link to the more general article graphical model. QVVERTYVS (hm?) 22:00, 28 January 2014 (UTC)

## Two problems are the same: "classification" and "clustering"[edit]

there are simply two general approaches to solve the same problem, supervised and unsupervised -- but the problem is one and the same. Fgnievinski (talk) 23:55, 3 May 2014 (UTC)

- Applications are also overlapping if not coincident. Fgnievinski (talk) 12:32, 5 May 2014 (UTC)
- As discussed in Talk:Statistical classification#Terminology: "classification" is supervised, "clustering" is unsupervised -- Really?, I
*disagree*that they are the same thing. - The objectives are different in the sense that classification tries to
*minimize the prediction error*. Clustering however tries to discover some*meaningful structure*, without knowing*what*to look out for (which is also why clustering more often than not returns crap results - too little guideance on what you are looking for). They are related, but clearly not the same thing. IMHO, the applications as well as the methods differ fundamentally, too. You can't easily take one method and transfer it to the other problem; not even naive bayes, or kNN classification. There are some cases where you have similar ideas - k-means also minimizes squared errors - but these occur in many other areas, too. And there are many clustering approaches not based on minimizing some statistical quantity. The big problem with clustering is evaluation: usually you evaluate by some statistical quantity (internal), or by class labels (external); both of which look a lot like classification. - Either way; we are not truth finders. There is plenty of literature that distinguishes these approaches, so we should not merge them. The rule of thumb in literature is that classification and regression are supervised, and this is well resembled by the ML bar template. --Chire (talk) 13:36, 5 May 2014 (UTC)
- I'm sorry, this is not a restatement of the previous talk. The methods are outside the scope of the present discussion. What
*is*inside the scope is that both methodological approaches aim to cluster, group, segment, partition, and classify input variates. All the problems addressed by unsupervised methods could be tackled by supervised ones if additional information is given. Fgnievinski (talk) 13:48, 5 May 2014 (UTC)- I also disagree on that. If you added labels to a data set, it would become a different problem: how to predict the labels of new instances, given the training data set, i.e. it becomes
*class prediction*, whereas it was structure discovery before. That is IMHO a quite different task. --Chire (talk) 13:57, 5 May 2014 (UTC)

- I also disagree on that. If you added labels to a data set, it would become a different problem: how to predict the labels of new instances, given the training data set, i.e. it becomes

- I'm sorry, this is not a restatement of the previous talk. The methods are outside the scope of the present discussion. What

## Reenforcement learning & Terminology[edit]

Seems confusing for an outsider to call it 'supervised learning' but then not talk about unsupervised learning or reenforcement ? Not sure what the best approach here would be, since clustering is (in some way) unsupervised learning -- would a rename be warranted? Perhaps worth renaming to "Unsupervised Learning / Clustering" ? Dm1911 (talk) 17:50, 27 May 2015 (UTC)

- Reinforcement learning is currently missing, we should add it. Unsupervised learning is much broader than clustering: it also encompasses dimensionality reduction and feature learning. QVVERTYVS (hm?) 18:46, 27 May 2015 (UTC)

- Added Reinforcement learning as per this discussion. Situphobos (talk) 07:34, 4 July 2016 (UTC)

## Add List of datasets for machine learning research[edit]

How about adding list of datasets for machine learning research to the bar? Any thoughts on this? --Datakeeper (talk) 19:04, 25 February 2016 (UTC)

## Collapsable version[edit]

Is there a way to make this template have collapsable sections? It's pretty large and a little unwieldy to put on pages. --Datakeeper (talk) 21:15, 22 February 2016 (UTC)

- @Qwertyus: Excellent - looks great! Thank you!--Datakeeper (talk) 21:17, 25 February 2016 (UTC)

## spam[edit]

I see this all over the place as intro diagram and it is not helpful. Diagrams about the topic instead of machine learning template on every subtopic would be better. Daniel.Cardenas (talk) 23:03, 1 May 2016 (UTC)

- Suggest putting after intro or in other words the first topic. This will encourage others to create a more specific and more helpful diagram for intro. Daniel.Cardenas (talk) 00:08, 2 May 2016 (UTC)

Per WP:NAVBOX: "The collection of articles in a sidebar template should be fairly tightly related... If the articles are not tightly related, a footer template (a navbox, located at the bottom of the article) may be more appropriate.". This definitely applies here. I cannot see this template being much use to anybody for navigation, and certainly not one of the first things a reader will look for on arriving on an article. I think conversion to a footer would be ideal. Bigbluefish (talk) 10:18, 13 February 2018 (UTC)