this implementation doesn't work, I need to debug and see where I've made a mistake. It seems to do something that makes sense but it's nowhere near the same level of performance that they had in the original paper.