perplexity in NLP applications By K Saravanakumar VIT - April 04, 2020. Perplexity of a probability distribution. Also, we need to include the end of sentence marker , if any, in counting the total word tokens N. [Beginning of the sentence marker not include in the count as a token.] For example, if the sentence was. The perplexity PP of a discrete probability distribution p is defined as ():= = − ∑ ⁡ ()where H(p) is the entropy (in bits) of the distribution and x ranges over events. Intuitively, perplexity can be understood as a measure of uncertainty. In short perplexity is a measure of how well a probability distribution or probability model predicts a sample. In other words, a language model determines how likely the sentence is in that language. (The base need not be 2: The perplexity is independent of the base, provided that the entropy and the exponentiation use the same base.) Now, I am tasked with trying to find the perplexity of the test data (the sentences for which I am predicting the language) against each language model. 