
Sigmoid
As you have already seen, the sigmoid function is a particular case of the logistic function, and it gives us something similar to a step function; therefore, it's useful for binary classifications, indicating a probability as a result. The function is differentiable; therefore, we can run gradient descent for every point. It's also monotonic, which means that it always increases or decreases, but its derivative does not; therefore, it will have a minima. It forces all output values to be between 0 and 1. Because of this, even very high values asymptotically tend to one and very low to 0. One problem that this creates is that the derivative in those points is approximately 0; therefore, the gradient descent process will not find a local minima for very high or very low values, as shown in the following diagram:
