{"id":5598,"date":"2025-09-09T14:44:30","date_gmt":"2025-09-09T18:44:30","guid":{"rendered":"https:\/\/labrigger.com\/blog\/?p=5598"},"modified":"2025-09-09T14:44:30","modified_gmt":"2025-09-09T18:44:30","slug":"precision-of-evaluation","status":"publish","type":"post","link":"https:\/\/labrigger.com\/blog\/2025\/09\/09\/precision-of-evaluation\/","title":{"rendered":"Precision of evaluation"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/labrigger.com\/blog\/wp-content\/uploads\/2025\/09\/fig4.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"554\" src=\"https:\/\/labrigger.com\/blog\/wp-content\/uploads\/2025\/09\/fig4-1024x554.jpg\" alt=\"\" class=\"wp-image-5599\" srcset=\"https:\/\/labrigger.com\/blog\/wp-content\/uploads\/2025\/09\/fig4-1024x554.jpg 1024w, https:\/\/labrigger.com\/blog\/wp-content\/uploads\/2025\/09\/fig4-300x162.jpg 300w, https:\/\/labrigger.com\/blog\/wp-content\/uploads\/2025\/09\/fig4-768x416.jpg 768w, https:\/\/labrigger.com\/blog\/wp-content\/uploads\/2025\/09\/fig4.jpg 1485w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>To what degree of <strong>precision<\/strong> can your evaluations of grants, papers, or applicants be quantified\u2014if forced into a single numerical score? <\/p>\n\n\n\n<p><strong>How many bits of resolution<\/strong> can we reasonably expect in such a score, and how reproducible would the resulting rankings be across repeated, anonymized evaluations of the same set, by the same reviewer?<\/p>\n\n\n\n<p>I pose these questions because I suspect that we are routinely asked to provide numerical evaluations to a degree of precision that is not supported by real world experience and available evidence. <\/p>\n\n\n\n<p>One example is when I&#8217;m submitting letters of recommendation to graduate schools for students and the school has an online system that requires me to rank the student in the &#8220;top 1%&#8221;, &#8220;top 2%&#8221;, &#8220;top 5%&#8221;, &#8230; Professors interact with many students, and to varying degrees. How reliably can we distinguish among these categories?<\/p>\n\n\n\n<p>When we assign a score from 1 to 5, we\u2019re expressing about 2.3 bits of information (since log?(5) ? 2.32). And that\u2019s assuming we use the scale evenly and consistently, which we often aren&#8217;t. In NIH review, the scale is technically 1 to 9, which would be 3.17 bits, but since most scores cluster around 3+\/-2, we&#8217;re back down to the 2 bits and change range.<\/p>\n\n\n\n<p>Now how reproducible are these numbers? Vary the time-of-day, focus, how alert and happy the reviewer is, familiarity with the field, unconscious biases, etc. Those two-plus bits of information are <strong>maybe reliable to less than 2 bits<\/strong>.<\/p>\n\n\n\n<p>That said, I have noticed that in NIH reviews, disparate reviewers often independently converge on similar scores. So I don&#8217;t intend to dismiss the value of the process, but we should scale the time, energy, and deference given to such scoring.<\/p>\n\n\n\n<p>In the end, a score provided by an evaluator is a low bit value metric. It\u2019s <strong>a compressed signal of human judgment<\/strong>\u2014and like any signal, it has noise, resolution limits, and a story behind it.<br><br><br><\/p>\n","protected":false},"excerpt":{"rendered":"<p><a href=\"https:\/\/labrigger.com\/blog\/wp-content\/uploads\/2025\/09\/fig4.jpg\"><\/a><\/p>\n<p>To what degree of <strong>precision<\/strong> can your evaluations of grants, papers, or applicants be quantified\u2014if forced into a single numerical score? <\/p>\n<p><strong>How many bits of resolution<\/strong> can we reasonably expect in&#8230;<\/p>\n<div class=\"read-more\"><a href=\"https:\/\/labrigger.com\/blog\/2025\/09\/09\/precision-of-evaluation\/\">Read More<\/a><\/div><\/p>\n","protected":false},"author":1,"featured_media":5599,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-5598","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/posts\/5598","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/comments?post=5598"}],"version-history":[{"count":1,"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/posts\/5598\/revisions"}],"predecessor-version":[{"id":5600,"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/posts\/5598\/revisions\/5600"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/media\/5599"}],"wp:attachment":[{"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/media?parent=5598"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/categories?post=5598"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/labrigger.com\/blog\/wp-json\/wp\/v2\/tags?post=5598"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}