The way that improvements in bullet engineering have affected the terminal performance of the other calibers the FBI would consider for use, are not relevant to the FBI. As I explained in my post above, the FBI only tests for terminal performance in terms of a pass/fail evaluation. They don't rank the calibers in terms of terminal performance, they only assess them as to whether they pass or fail.
So the calibers that ALREADY passed before the bullet engineering improvements were made STILL pass after the bullet engineering improvements were made. They can't "pass better" than they already did, so it doesn't change anything. What changed is that now 9mm ALSO passes and so it's eligible for selection whereas it was not eligible before.
This is one of the two big things to understand in the current CW of handgun terminal ballistics. The FBI test is based on a binary pass/fail model, and people reverse-logic-ed that into an understanding that handgun terminal ballistics are binary in reality (as opposed to the testing model). Binary assessments may make good sense when it comes to procurement requirements (that's common in procurement for all sorts of things having nothing to do with guns or even government contracting). It doesn't mean that there are not actual gradations of efficacy on either side of the model's binary dividing line.
The other big thing to understand is that Fackler and Roberts are coming at the problem from the perspective of trauma surgeons. They aren't studying gunfights... they are studying the bodies (alive and dead) that show up in trauma centers or field hospitals ~30 minutes later. So the only thing they are giving weight to is permanent wounding. Which is pretty on-point for most military analysis (because the objective there is often to kill or permanently disable an enemy soldier), but it's not directly on point to the objectives of a lawful civilian (including LEO) self-defense user of a handgun. The handgun is used to alter behavior - specifically, to cause the threatening person to cease their aggression as rapidly as possible. Permanent wounding is surely somewhat correlated with this goal, but it is an indirect measurement of the thing we care about.
But there are things a trauma surgeon won't detect 30 minutes after a fight that were dispositive in the fight itself. Take guns out of it and imagine a fistfight. An aggressor who is swinging away might take a hard punch to the gut and be staggered, trip, and fall - and themselves cease their attack from the moment of the gut punch. If the other person then kicks the fallen fighter in the head, or if the fall includes the fighter's head hitting a concrete curb, he may die. The trauma surgeon evaluating things 30 minutes later would diagnose the head kick/impact as being the key to the fight, because they'll find fractured skull bones and brain swelling or bleeding. But they'll likely find little or no evidence of the gut punch, even though that was the thing that caused the aggression to cease.
To summarize, you have to understand that much of the current terminal ballistics thinking is based on 1) an indirect measure of efficacy (wounding); and then 2) filtered through some approximations of the things that are expected to generate the measured thing (gel tests as a proxy for wounding); and then 3) reduced to a binary adequate/inadequate result because of a procurement test. These all may be fine ways of making a decision about what ammo to use, but nobody should be fooled into thinking that this is a fully-developed model that closely tracks the real world. There are all sorts of shades of probability and onset of effects that this simplified model simply discards.