Talk:Perplexity

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Mathematics (Rated Start-class, Low-priority)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
Start Class
Low Priority
 Field:  Probability and statistics
WikiProject Systems  
WikiProject iconThis article is within the scope of WikiProject Systems, which collaborates on articles related to systems and systems science.
 ???  This article has not yet received a rating on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
Taskforce icon
This article is not associated with a particular field. Fields are listed on the template page.
 

Perplexity as confusion[edit]

I'd humbly suggest that somewhere should be found a small amount of space to formally recognize that perplexity, to most English speakers' understanding, a state of confusion or bewilderment. I think it's great to learn details about this math theory, but perhaps someone can find a spot to mention the origin of the word or any kind of clue/reference to the actual non-math meaning of perplexity.

Action taken: I've added a link to the Wiktionary definition. --84.9.73.211 (talk) 10:01, 6 February 2008 (UTC)

Plagiarism[edit]

Phrase "Perplexity Per Word"[edit]

"Perplexity Per Word" is taken verbatim from the following uncited source: [1] --133.11.168.148 (talk) 05:32, 27 May 2010 (UTC)

It's just perplexity scaled by the number of words. It's a really common term and would be even more common if most papers didn't just say "perplexity" and assume the reader knows it's per word. --130.108.16.157 (talk) 17:44, 20 April 2011 (UTC)
Deriving a "per word" unit from a unit applied to text in an obvious thing to do for anybody on the field. Citing a source for it would be most unusual. There is no plagiarism here. Jojo (talk) 16:42, 15 April 2017 (UTC)

Better wording[edit]

Why not just write this page as follows:

In information theory, perplexity is a unit of information such that a perplexity of $p$ equals $log_2 p$ bits, or $log_10 p$ hartleys, etc.

The rest of the page is redundant with most information theory articles.MisterSheik (talk) 02:39, 21 April 2011 (UTC)

Since when does to fact that similar information can be found in academic articles mean that a wikipedia page should be removed?

I agree, this could even be merged with Entropy. Full Decent (talk) 22:17, 18 April 2016 (UTC)

By the same token, any other page about a derived concept could be merged. The main purpose of an encyclopedia is to help people to quickly understand a concept. This page explains the perplexity measure. It also links to entropy for those who want to know more about the theory behind it. Making it a subsection of "Entropy (information theory)" would make it much more difficult to find. Jojo (talk) 16:42, 15 April 2017 (UTC)

Seriously out-of-date figures...[edit]

Article currently states:

The lowest perplexity that has been published on the Brown Corpus (1 million words of American English of varying topics and genres) as of 1992 is indeed about 247 per word

I'm really not sure why the best we could do 25 years ago is relevant here. I see perplexity of language models in English (although not on the Brown corpus) that are substantially lower than this (e.g. ~100 for recurrent neural net models). If the Brown corpus is the benchmark, what are more recent figures on that corpus? JulesH (talk) 09:17, 3 June 2017 (UTC)

I haven't seen any recent experiments on the Brown corpus, but there's no reason not to replace that with a more common benchmark. The One Billion Word Benchmark [1] is common now, and Exploring the Limits of Language Modeling (2016) [2] gets a perplexity of 23.7 using an ensemble of massive LSTM and skipgram models. I'm not aware of any better results on that set. That whole section needs to be rewritten if we switch the example, though; it refers to the Brown corpus and the 247 ppl result all over the place. 130.108.103.115 (talk) 23:01, 26 September 2017 (UTC)

References

  1. ^ Chelba, Ciprian; Mikolov, Tomas; Schuster, Mike; Ge, Qi; Brants, Thorsten; Koehn, Phillipp; Robinson, Tony (December 2013). "One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling". arXiv:1312.3005.
  2. ^ Jozefowicz, Rafal; Vinyals, Oriol; Schuster, Mike; Shazeer, Noam; Wu, Yonghui. "Exploring the Limits of Language Modeling". arXiv:1602.02410.

Inverse perplexity[edit]

The inverse of the perplexity (which, in the case of the fair k-sided die, represents the probability of guessing correctly), is 1/1.38 = 0.72, not 0.9.

This seems barmy to me. Inverse perplexity seems like a crazy thing. Perplexity is kind of like average list length in an underspecified input system. Inverse list length WTF? I can't make any useful mental picture of this. — MaxEnt 00:22, 30 March 2018 (UTC)