Implementing neural networks and leveraging all the power hidden in your CPU or GPU has become as simple as a few lines of Python code with libraries such as Keras and Tensorflow. Complex applications are remarkable, but how does that work? In order to understand the complex, I like to start with the simple concepts and build in complexity.
Single neuron and linear regression
A single neuron doesn't do very much: given a parameter W (the weight), a bias b and a activation function, it calculates for X the value $\hat Y = f (WX + b)$
We will use the sigmoid function or a linear function (Y=X) for f in this article.
If we use a linear function, then $\hat Y = WX + b$. A straight line. And training it on a data set (X,y) with a quadratic error function is equivalent to do a linear regression: in both cases, it looks for the solution of $$ \min_{W,b} \sum_{i} | Wx_i+b - y_i |^2 $$
Just slower and less accurate because in the case of the linear regression we use an analytic result instead of running an optimization. Here's the code, assuming that you installed Keras. I use it with Tensorflow:
import numpy as np
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras import optimizers
import keras.objectives as losses
# build the model with Keras: 1 layer, 1 neuron
model = Sequential()
model.add(Dense(1, input_dim=1))
# prepare the training set
np.random.seed(42)
X_train = (np.random.rand(50)*10).reshape(-1,1)
Y_train = 0.1*X_train+2 + (np.random.randn(50)*.1).reshape(-1,1)
# use the mean square error and optimize with gradient descent
model.compile(loss=losses.mean_squared_error, optimizer='sgd')
model.fit(X_train, Y_train, nb_epoch=200, verbose=False)
# diplay the result
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(X_train.squeeze(),model.predict(X_train).squeeze(),'.r')
plt.scatter(X_train.squeeze(),Y_train.squeeze())
And the result is:
That's a very inefficient way to do a linear regression, but it shows well the training and prediction.
With N neurons, we would be able to calculate a linear combination of a vector of size N: X=(X[0],...,X[N-1])
We can obtain W and b with the following lines:
for layer in model.layers:
h=layer.get_weights()
print (h)
Which yields:
[array([[ 0.13543533]], dtype=float32), array([ 1.79489422], dtype=float32)]
That's 0.13 instead of 0.1 and 1.79 instead of 2. Not great, but with 500 more iterations I have the following W=0.09437597 and b = 2.00935483. Getting close!
By the way, the proper way to do linear regression is of course:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train,Y_train)
print(lr.intercept_,lr.coef_)