... why do I need it?

To compare individual candidates directly is not an easy task: what criteria determine whether candidate A or B better fits the vacancy? Do you have a reliable benchmark or are you comparing apples with pears? Often your impressions gained in the interview decide on whom you will select, although you cannot be sure that your personal favourite is really the right choice. With standardised comparison groups, you make sure that you choose the right candidates through psychological tests.



The overall outcome of a standardised, psychological test procedure is always in proportion to the results of other people. Graphically, a result can be best illustrated by the “Gaussian” curve which is based on the normal distribution concept of Carl Friedrich Gauss: most people reach average results in a test procedure that are around the mean value. The further one turns the view from the mean value outwards, the less results can be found, which can be seen in the flatter gradient of the curve. Individuals whose results are in these areas are therefore classified as either below or above-average. Anyone who remembers their own school days or those of their children will certainly remember grades that were difficult to classify on their own. Whether “satisfactory” is a reason to party or means that hours of revision are now called for can only be decided when two things are known: the mean value in the form of the score shows whether the performance was below average, average or above-average. And from the distribution of the results between grades 1 to 6 you can use statistical dispersion methods to find out how far above or below average the performance was.

And the normal distribution is also ascertained using the mean value and standard deviation, because the individual results can rarely stand alone.

While it is perfectly clear in the case of school grades that the result will lie somewhere between 1 and 6, the result achievable in a test depends not only on the length of the test. Since this can vary as well as the achievable score, a specific form of the normal distribution is used for the “Z scale” using a uniform scale. The mean value is always 100 and the standard deviation 10 points. This is why people speak about considerably above or below average results if the value of 90 is exceeded or the value of 110 exceeded. In order to further differentiate the average range, we have subdivided it again, so that it is only from 97 to 103. This subdivision reduces the share of average candidates from 68 per cent to 24 per cent which benefits decision-making practice. We then speak of slightly below or above-average results – and for a value of 90 or above 110, we refer to the results as below or above-average.

Using the Z scale, you can see as an HR decision-maker where the candidate you are considering is within the comparison group. The variables considered for this are called the Z-value and the percentage rank.

The Z value indicates the candidate's value on the scale; candidate A, for example, has the overall Z value of 112 within the overall result of all the candidates. This candidate’s percentage of 88 indicates the proportion of candidates who received a worse score than this. The Z value and percentage rank are thus directly related to each other: the better candidate A scores in the test (the higher the Z value), the more participants received a lower score (= the higher the percentage). If candidate A reaches a Z value of 112, he is one of the highest-ranking candidates in the normal distribution and only 12 percent of the comparison group achieved an even better result. However, if the candidate’s Z value were 90, than this would indicate that 84% of all candidates received a better score and you should consider whether you want to invite this candidate to the next round in your recruiting process. If the majority of the results were too far from the actual normal distribution, i.e. most of the applicants were not at the targeted mean of 100, but at 90, this would indicate that the scale may need to be adjusted; but more on that later on.  


Imagine a mathematics test in which the participants can score up to 25 points. Applicant A gets 18 points. At first sight, this result looks promising, with 72% of the possible points. However, only if you compare the result to all other candidates and their result, you'll get a grasp on how good it actually is. Then you can see whether applicant B has an average score, whether they have performed better than the others or whether most of the applicants have achieved a higher score –meaning that candidate B has a lower than average score. 

A single result does not reveal the successful performance of an applicant – it must be viewed and interpreted in relation to the comparison group.

(Matthias Kämper, Executive Vice Director at HR Diagnostics) 


Imagine a test result that will show you at a glance all candidates in question. You could now simply sort them by the total score and select the 10 best candidates for the next round. However, two key questions remain open:

  1.   Are the best candidates also the right ones for the job you are looking for?
    If the test has been put together based on a request analysis – i.e. it is state-of-the-art – the answer is yes. In this case, the test records precisely those features that are relevant to a specific position. However, when looking at a result in detail, an applicant can turn out to be inappropriate because, for example, they do not have enough challenges in the job, are not interested in their work or their individual motivation cannot be matched with the career paths of the company. An applicant with a strong career orientation may not be happy in a company with flat hierarchies despite being suited to the job. Even with a matching comparison group, you still have to deal with complex contexts.
  2.  On what basis are the best the best? Have they been compared with the right group?
    If, for example, a single psychological test is used to compare several candidates who are applying for different positions, then the comparison base is the wrong choice because the group of candidates are not competing for the same place. For example, candidates for an apprenticeship in a technical profession and skilled workers are not competing for the same jobs. There is therefore no sense in using the same comparison standard for the technical and mechanical understanding of all the applicants – in this case, applicants for an apprenticeship would tend to perform below average and the skilled workers above average. At the same time the differentiation within the two groups would not be as clear as with a separate norm. Therefore, this type of heterogeneous comparison group does not do justice to the individual applicants and does not allow them the opportunity to highlight their talents within their comparison group. Ensuring the right benchmark for each of our clients is therefore an absolute must – 

because only the right comparison standard – i.e. the right benchmark – ensures a fair selection process.


At HR Diagnostics we are often asked to use external benchmarks as a comparison group. For example one’s own applicants should be compared with all other applicants in the respective sector. But why? First of all, every company, even within an industry, is highly individual. Sometimes commercial employees require technical understanding, sometimes sales staff has to be fluent in English and on occasion industrial workers have to be able to deal with regular customer contact. If the specific requirements for the applicants differ so substantially from company to company, it is obvious that the benchmark should also be of an individual nature. At the same time, the benchmark also depends on your reach – whether you are recruiting on a regional or national basis. In a regional selection procedure it makes little sense to compare your applicants with those from all over Germany and to ignore local characteristics. When the comparison standard does not fit it is always detrimental to the comparability of your own candidates. These candidates are therefore the ideal benchmark for defining your standards.


At the start of your recruiting process, HR Diagnostics can draw on a range of individual comparison standards that are tailored to the requirements of the applicants for specific positions. Whenever possible, we standardise the testing procedures at the beginning of the selection process – even before the first decisions are made. This makes sense because the basis for comparison should not be changed during a selection process. After a selection round, new standards can be selected to further differentiate the applicant group if necessary. With psychological testing, scientifically based analysis and individual norms you are able to optimally differentiate your candidates – and take the right decision at the conclusion.


We can provide our clients with training on the use of standard groups and the correct interpretation of the test results.