The slopes of these red lines, g, are the sub-gradients of relu(z) at z=0, where g is uniformly distributed! We know that for a given random variable g, with a certain probability distribution p(g) in the interval [a,b] the expectation is equal to:
%20%3D%20%5Cint_%7Ba%7D%5E%7Bb%7D%20%7Bg%20%5Ctimes%20p(g)%20dg%7D)
And when g is uniformly distributed in the interval [a, b], p(g) must be:
%20%3D%20%5Cfrac%7B1%7D%7Bb-a%7D)
So that the area under the curve would be exactly equal to 1. More specifically, this is a rectangular area, whose length is equal to
and whose width is
. And clearly, the area of this rectangel is 1:
%20dg%7D%3D(b-a)%20%5Ctimes%20%5Cfrac%7B1%7D%7B(b-a)%7D%3D1)
Otherwise, p(g) cannot be a probability distribution! In our case, the range [a,b] is equal to [0,1], as this is the range of possible values for the slopes of these sub-gradients.
Now, back to computing the expectation of these bloody sub-gradients (i.e., slopes of infite number of those red lines at z=0) represented by g:
%20dg%7D%20%3D%20%5Cint_%7Ba%7D%5E%7Bb%7D%20%7Bg%20%5Ctimes%20%5Cfrac%7B1%7D%7B(b-a)%7D%20dg%7D)
From there let’s replace [a,b] with [0,1]:

So:
%20%3D%20%5Cint_%7B0%7D%5E%7B1%7D%20%7Bg%20%5Ctimes%20dg%7D%20%3D%20%5Cfrac%7Bg%5E2%7D%7B2%7D%20%5Cqquad%20%5Cqquad%20%20%7C%5E%7B1%7D_%7B0%7D%20%3D%20%5Cfrac%7B1%7D%7B2%7D%20-%20%5Cfrac%7B0%7D%7B2%7D%3D%200.5%20)
So, the expected value of the sub-gradient over infinitie number of sub-gradients is 0.5! I know! We ended up with the mid-point in the range [0,1]. However, I am more convinced by the “Expectation” arguement than I am with the “mid-point” arguement. So, this means that when z=0 (as rare as it is), define the candidate sub-gradient to be:
%7D%7B%5Cpartial%20z%7D%20%3D%20E(g)%20%3D%200.5%20)
Moral of the story: You can choose any value in the range [0,1] and your ANN will still train. However, I like my expectation arguement as it lays a consistent arguement rather than just picking a random value!