The N-Best Rule: Hiring and promotion committees should solicit a small number (N) of research products and read them carefully as their primary metric of evaluation for research outputs.I'm far from the first person to propose this rule, but I want to consider some implementational details and benefits that I haven't heard discussed previously. (And just to be clear, this is me describing an idea I think has promise – I'm not talking on behalf of anyone or any institution).
Why do we need a new policy for hiring and promotion? How do two conference papers on neural networks for language understanding compare with five experimental papers exploring bias in school settings or three infant studies on object categorization? Hiring and promotion in academic settings is an incredibly tricky business. (I'm focusing here on evaluation of research, rather than teaching, service, or other aspects of candidates' profiles.) How do we identify successful or potentially successful academics, given the vast differences in research focus and research production between individuals and areas? Two different records of scholarship simply aren't comparable in any sort of direct, objective manner. The value of any individual piece of work is inherently subjective, and the problem of subjective evaluation is only compounded when an entire record is being compared.
To address this issue, hiring and promotion committees typically turn to heuristics like publication or citation numbers, or journal prestige. These heuristics are widely recognized to promote perverse incentives. The most common, counting publications, leads to an incentive to do low-risk research and "salami slice" data (publish as many small papers on a dataset as you can, rather than combining work to make a more definitive contribution). Counting citations or H indices is not much better – these numbers are incomparable across fields, and they lead to incentives for self-citation and predatory citation practices (e.g., requesting citation in reviews). Assessing impact via journal ranks is at best a noisy heuristic and rewards repeated submissions to "glam" outlets. Because they do not encourage quality science, these perverse incentives have been implicated as a major factor in the ongoing replicability/reproducibility issues that are facing psychology and other fields.
Details of the proposed N-best policy. Evaluation processes for individual candidates should foreground – weight most heavily – the evaluation of N discrete products, with N a number varying by career stage. I'm referring to these as "products" rather than "papers," because it's absolutely critical to the proposal that they can be unpublished (e.g., preprints), thesis chapters, datasets, or other artifacts, when these are appropriate contributions for the position. All other aspects of an applicant's file, including institutional background, lab, other publication record, and letters of recommendation, should serve as context for those pieces of scholarship. And these products must be read by all members of the evaluating committee.
- Currently papers are solicited, but there is no expectation that they will be read. Under this policy, it would be required that they be read by all committee members.
- The CV is currently the primary assessment tool, rewarding publishing a lot and in high-profile outlets. Under this policy, the CV would be explicitly de-emphasized, and references to "productivity" would not be allowed in statements about hiring.*
- Under current standards, letters of recommendation are solicited for hiring and promotion, but there is little guidance on what these letters should do (other than sing the applicant's praises or compare the applicant in general terms to other scholars). Under N-Best, the explicit goal of letters would be to contextualize the submitted scholarship and its contributions to the broader research enterprise, so as to mitigate the problem of scholarship outside of the committee's expertise. Other general statements, e.g. about productivity, brilliance, etc. would be discounted.
- We currently weigh job talks very heavily in hiring decisions. This practice leads to strong biases for good presenters. Under N-best, the goal would be to use interviews/job talks to assess the quality of the submitted research products. If the evaluation of a small number of distinct research findings is the nexus of the assessment, then what someone wore or whether they were "charismatic" (e.g., good looking and friendly) becomes a bit easier to confuse wth the task at hand.
What should N be? In a single-parameter model, setting that parameter correctly is critical. Here are some initial thoughts for reasonable parameter values at an R1 university.
- Postdocs hiring: 1 - 2 products. Having done two good projects in a PhD is enough to show that you are able to initiate and complete projects. Some PhDs only yield a single product, but typically this product will be comprehensive or impressive enough that its contribution should be clear.
- Applicants for tenure-track, research-intensive positions: 3 products. Three good products seems to me enough to give some intimation of a coherent set of methods and interests. A research statement will typically be necessary for contextualizing this work and describing future plans.
- Tenure files: 4 - 5 products. If you have done five really good things, I think you deserve tenure at a major research university. Some of these products could even review other work, giving the opportunity to foreground a synthetic contribution or a broader program of research; this synthesis could also be the function of a research statement.
How can we apply this policy fairly to large applicant groups? Many academic jobs receive hundreds of applicants, but some heuristics could make the reading load bearable (if still heavy):
- The whole committee need only read products from a short list, reading work can be divided amongst members for the full set.
- The committee can ask for ranked products, so only the first is be assessed in first pass.
- The committee can rank applicants on explicit non-research criteria (e.g., area of interest, teaching, service, etc.) prior to evaluating research to narrow the set of candidates for whom papers must be read.
There's still no question that this policy will result in more work. But I'd argue that this work is balanced by the major benefits that accrue.
Benefits of the N-best policy. The primary benefit is that evaluation will be tied directly to the quantity that we want to prioritize: good science. If we want more quality science getting done, we need to hire and promote people who do that science, as opposed to hiring and promoting people who do things that are sometimes noisily correlated with doing good science (e.g., publishing in top tier journals). Unfortunately, that means that we need to read their papers and use our best judgment. Our judgment is unlikely to be perfect, but I don't think there's any reason to think it's worse than the judgment of a journal editor!
A second major benefit of N-best is that – if we're actually reading the research – it need not be published in any particular journal. It can easily be a preprint. Hence, N-best incentivizes preprint posting, with the concomitant acceleration of scientific communication. Of course, publication outlet will still likely play a role in biasing evaluation. But good research will shine when read carefully, even if it's not nicely typeset, and candidates can weigh the prospect of being evaluated on strong new work against the risks.
Conclusions. If we want to hire and promote good scientists, we need to read their science and decide that it's good.***
* I've certainly said that candidates are "productive" before! It's an easy thing to say. Productivity is probably correlated at some level with quality. It's just not the same thing. If you can actually assess quality, that's what you should be assessing.
** E.g., work they've gotten past the slow judgment of an editor who should be writing decision letters instead of blog posts!
*** Any other evaluation will fall prey to Goodhart's law.