Professional Documents
Culture Documents
Final Rating & Comparative Ratings
Final Rating & Comparative Ratings
Imagine you are starting with ground zero, where both responses are the same. Then you start
considering the ratings across individual dimensions, to move the needle to slightly better, better or much
NOTE: The information here is for guidance and directional purposes only. We acknowledge and
understand that there can be edge cases and we trust you to keep an open mind while attempting these
tasks.
Starting Point
Overall Quality
If Overall Quality of a response is 1 point higher than the other, then one response is slightly
If Overall Quality of a response is 2 points higher than the other, then one response is better -
Possible Outcomes:
If Overall Quality of a response is 3 points higher than the other, then one response is much
If Overall Quality is the same, then both the responses are likely the same or we need to look at
If major issues are marked in any of these dimensions for one response
If minor issues are marked in any of these dimensions for one response
Overall quality should be Pretty Bad, Okay or Pretty Good, dependent on the frequency of errors
The other response can be the same/better/slightly better than this, dependent on the frequency
of errors
One Response cannot be “much better” or “better” than the other if both responses have the
These dimensions do not make one response “much better” than the other if they are the ONLY
differentiating factor, they attribute to responses being the same or one being slightly better.
Better is also possible depending on the severity/frequency of the error in these dimensions but
The criteria of Instructions Following - Completeness and Depth take a higher priority in impacting
Formatting
This should not be the main differentiating factor between one response being better than the
other, unless - a) the formatting is a part of the Instructions Following i.e. the Prompt Constraint,