Statistically improbable phrase  

From The Art and Popular Culture Encyclopedia

Jump to: navigation, search

Related e

Wikipedia
Wiktionary
Shop


Featured:

Statistically Improbable Phrases, Statimprophrases or SIPs constitute a system developed by Amazon.com to compare all of the books they index in the Search Inside! program and find phrases in each that are the most unlikely to be found in any other book indexed.

In Amazon's own words:

"Amazon.com's Statistically Improbable Phrases, or "SIPs", show you the interesting, distinctive, or unlikely phrases that occur in the text of books in Search Inside the Book. Our computers scan the text of all books in the Search Inside program. If they find a phrase that occurs a large number of times in a particular book relative to how many times it occurs across all Search Inside books, that phrase a SIP in that book."


The system is used to find the most nearly unique portions of books for use as a summary or keyword.

Example

  • Book 1
The big brown fox jumps over the lazy dog. The lazy dog did not like the fact that the big brown fox jumped over him, so the lazy dog ran after him.
  • Book 2
You never have to log in to read Wikipedia. You do not have to log in even to edit articles on Wikipedia—anyone can edit almost any article, even without logging in. Nevertheless, creating an account is quick, free and non-intrusive, and it's generally considered a good idea to do so, for a variety of reasons.
  • Book 3
If you create an account, you can pick a username. Edits you make while logged in will be assigned to that name. That means you get full credit for your contributions in the page history (when not logged in, the edits are just assigned to your (potentially random) IP address). You can also view all your contributions by clicking the "My contributions" link, which is only visible when you are logged in.

SIPs

For Book 1, the SIP would most likely be "Big Brown Fox" and "Lazy Dog"

For Book 2, the SIP would most likely be "Wikipedia", but not "account" because it is featured in Book 3 many times.

For Book 3, the SIP would most likely be "Contributions", and "Logged In"

Example

The Statistically Improbable Phrases of Darwin's On the Origin of Species are: temperate productions, genera descended, transitional gradations, unknown progenitor, fossiliferous formations, our domestic breeds, modified offspring, doubtful forms, closely allied forms, profitable variations, enormously remote, transitional grades, very distinct species and mongrel offspring[1]

See also

  • Googlewhack — a pair of words occurring on a single webpage, as indexed by Google
  • tf*idf — a similar weight often used in information retrieval and text mining.
  • Dataclysm




Unless indicated otherwise, the text in this article is either based on Wikipedia article "Statistically improbable phrase" or another language Wikipedia page thereof used under the terms of the GNU Free Documentation License; or on research by Jahsonic and friends. See Art and Popular Culture's copyright notice.

Personal tools