I am currently trying to understand the econometrics of the Paper of Dippel et al. (2015), where the authors estimate the causal effect of imports on voting behavior in Germany. Therefore, they estimate the following first-difference specification, where Yit refers to electoral outcomes, NetExposure refers to import - export exposure, τtr are time-varying fixed effects and Xit is a of control variables:

What I do not really get is why the authors include "undifferenced" controls (Xit) in the first difference model. Differencing the variables allows to get rid of unobserved time constant effects, so why would one not difference the control variables as well?
Can anyone explain the initiation behind it ?
Thank you in advance for any answers!
