I stumbled across a nice test for the evaluation method.
Take a position and evaluate it, then mirror the position (switching all pieces from left to right) and evaluate again, then reflect the position (from front to back, and invert the colors, and switch side to move) and finally reflect and mirror it.
This gives four different positions that should give exactly the same evaluation, assuming a symmetric evaluation.
So far I've found around 10 bugs, and there are probably more to come. Some of them were just symmetric flaws (like giving the king better evaluation if it's on C1 rather than F1, which makes sense on some level but really is not that good).
But some of them were ranging from minor to pretty severe. For example I noticed that my weak pawn evaluation was first of all checking the pawn could move backwards(!) to be supported. And also that I for some infernal reason checked for the absolute value of the piece when checking for blocked pawns. Which on some positions made white not care about the pawn being blocked (unless it was doubled), while black did.
Another one was evaluating fianchetto for black by checking if the pawn in front of the king was on the third rank instead of the fifth.
I'm not sure how much these bugs actually matter in terms of playing strength (sure rewarding fianchetto when pawn is on third rank might be wrong, but a bishop in front of the king in those positions might not be so bad after all).
I'll write something together to explain this excellent test a bit more thouroughly.