The Algorithm Behind SentiMapp
The SentiMapp index shows if the ratings given by past users convey, on average, a positive or a negative message to potential new users. It is a more complicated algorithm than a simple average, but…well, if you are eager to know why and how it was created, read below.
Briefly on Star Rating Systems
The most famous star rating system is arguably the Michelin one. The Michelin guide helps you find good restaurants, and it does not indicate if a restaurant is bad. Restaurants are rated “very good” (), “worth a detour” (), and () when you should start a special journey just for the pleasure of the meal.
Similarly, rating systems of hotels do not indicate a service as “bad”. They normally define the category to which the hotel belongs. A hotel would still show its single star on the door, advertizing to potential customers that they will not spend much, but will still have the opportunity to stay in a confortable place.
But in the past years, the introduction of crowd sourced “star rating systems” has significantly changed the way stars are perceived by potential customers. In the app ecosystem, applications with few stars are probably not worth the money and apps with just one star are not even worth the effort of a free download.
Perception of Quality in Application Stores
Average Number of Stars
The first message received by a potential buyer on both the iOS 6 and Google Play is the average number of stars:
How does this parameter influence potential buyers?
We have studied how the number of sales varies as a function of the rating, and, as expected, an average of few stars (one or two) has a negative impact on sales. More stars (four and five) a positive one. An average of three stars, on the contrary, does not have much influence on potential buyers, and it is equivalent to no star at all. What we tried to do, is go beyond the average….
Beyond the Average
Let us follow the discovery experience on the app store. Once users search for a possible application to satisfy their needs, they click on the ones on top of the list, possibly with an average rating bigger than three. Note that iOS 6 only shows the average rating for the current version, while Android provides the average rating accumulated over all versions of the application. From the developer’s point of view, Android’s approach may not be the best one, as it could include low ratings given to old versions of the application, but it ensures the highest possible statistics.
Once users have decided, based on a few factors like name, description and, of course, average rating, that an application is worth a click, they will land on the description tab (“Details” or “Overview”, respectively, on iOS and Android). If the application really seems to be the right one, the next step is discovering what other people think about such application. Users will then visit the “Reviews” tab to receive some crowd sourced information.
On the information provided in the “Reviews” tab, Apple followed Google. Both iOS 6 and Android provide the frequency distribution, aka histogram, of the ratings: how many buyers voted one star? How many voted two stars? And so on…
The histogram provides much more information than the average. Have a look at the following histograms:
They both have similar average, but the message delivered to a potential buyer is not the same. On the first distribution, the number of buyers who rated “one star” would not be cause for worries for a potential buyer, and not much attention would be given to the very few negative ratings. On the second distribution, on the contrary, the amount of one star’s is suspiciously high, and indicates a hard core of unsatisfied buyers.
The least potential buyers will do is to analyze in depth the reviews provided on the app store, and understand if there is any risk they themselves could be part of the hard core of unsatisfied customers.
To make a simple analogy, inserting your hand in the mouth of truth, would you prefer to be sure nothing happens (three stars) or to have 50% chance of getting your hand cut off (one star) and 50% chance of getting ten grand (five stars)? Although this would be a nice experiment to do in Rome, we think that people would normally chose the first option –no risk.
The Sentiment Algorithm
In order to take in account the difference between two rating histograms with the same average of stars but a different distribution, we have developed an algorithm which measures the “polarization” of the distribution, in addition to the average. Combining average and polarization, we then extract an index which helps the developer understand if the ratings acquired until now play pro or against sales.
Using the variance (or second central moment) and the kurtosis (fourth central moment), we measure the disagreement of people who rated an application. Note that it is not possible to simply use the kurtosis, which does actually measure the ”peakedness” of a distribution, because, for instance, the two (extreme) distributions below would give the similar results:
Our algorithm, on the contrary, measures how much buyers agree on their rating. For instance, half of the buyers rating two stars and the other half rating four stars, would give a completely different result than half on one star, half on five stars.
We then put together average and the level of agreement. Let’s take the following three possible distributions of rating:
They all have the average rating of three stars. The sentiment algorithm will “bend” the average in the first and second distributions, respectively above and below “three stars”, while not doing anything on the third. In the third case, there is the highest possible uncertainty about buyers’ opinion, and the sentiment is 0. This means that the rating is virtually uninformative, and does not influence a possible buyer’s opinion. We will then have a slightly positive sentiment in the first case, negative in the second (even if the average is three stars), and null in the third. On the opposite, for the two distributions below:
we would have highly positive sentiment in the distribution on the left, and highly negative sentiment on the right, independently of the fact that the polarization is very low.