I think at least some of the structure is down to the update equation, roughly 'new = sign(gate * old + ...)'. Since 'gate' is always positive, this means that even when the network is randomly initialized, the state is more likely to stay the same between two time steps than to change.
I tried "deep transitions" which repeats this equation a couple of times, and the results were indeed noisier.
I tried "deep transitions" which repeats this equation a couple of times, and the results were indeed noisier.