Training Error for any classifier is defined as a fraction of training samples that are misclassified. so in terms of whether or not the classifier applied to their training example, whether it disagrees with a given label for that example.
Mathematical Definition: Perceptron Algorithm (without offset) is stated as the following: Perceptron $\begin{equation}(\{(x^{(i)},y^{(i)}),i=1,...,n\},T):\end{equation}$ initialize $\begin{equation} θ=0\end{equation}$ (vector); for $\begin{equation} t=1,...,T \end{equation}$ do for $\begin{equation} i=1,...,n \end{equation}$ do if $\begin{equation} y^{(i)}(θ⋅x^{(i)})≤0 \end{equation}$ then update $\begin{equation} θ=θ+y^{(i)}x^{(i)}\end{equation}$
Since each classifier is specified by a primitive vector $\theta$, we can think of evaluating a parameter vector in terms of its resulting training error which is a fraction of misclassifying points.
We are saying whether the given label, its sign agrees with the linear prediction.
So if this is $0$ or $negative$, then a linear classifier,that is the sign would be different from the given label. And note, that when this is exactly $0$, i.e if the example lies exactly on the linear boundary, we count that as an error, since we don't know which way we should really classify that point.
Now we can analogously define training error for a linear classifier, general linear classifier, as the fraction of misclassified points,by looking at whether their given label agrees with the sign the linear function used to define the linear classifier.
If that is $0$ or $negative$, then we call it an error.
if an error, then we do something. We update the parameters to correct for that error. And since parameter vectors are $0$, $ (y^{(i)}(θ⋅x^{(i)}))$ is $0$. , so the first example is always considered an error.
perceptron algorithm makes actually a very simple update to the parameters.
very simple update to the parameter in response to any mistake.
let's look at whether an example that is misclassified on the basis of which we perform the update, whether it would be misclassified if reintroduced as an example of the next step.
So then we would consider calculating an error,$(y^{(i)}(θ⋅x^{(i)}))$ which would be:
Now, what are we going to do here over the whole training set is we start with the 0 parameter vector step 1. and then go over all the training examples. And if the i-th example is a mistake, then we perform that update shown in step 7 So we'd nudge the parameters in the right direction, based on an individual update.
Since the different training examples might update the parameters in different directions,it is possible that the later updates actually make the earlier undo some of the earlier updates,and some of the earlier examples are no longer correctly classified.So we have to go through the training set here multiple times, here capital T times go through the training set, either in order or select at random, look at whether it's a mistake, and perform a simple update.
Theoretically, this classifier for a sufficiently large T, if there exists a linear classifier through origin that correctly classifies the training samples, this simple, very simple algorithm actually will find a solution to that problem.
There are many solutions typically, but this will find one.
Now you will implement the single step update for the perceptron algorithm (implemented with 0−1 loss).
The function should basically follow this rule:
We will test the out put based on the following inputs:
import numpy as np feature_vector = np.array([1, 2]) label, theta, theta_0 = 1, np.array([-1, 1]), -1.5 exp_res = (np.array([0, 3]), -0.5)
def perceptron_single_step_update( feature_vector, label, current_theta, current_theta_0): """ Properly updates the classification parameter, theta and theta_0, on a single step of the perceptron algorithm. Args: feature_vector - A numpy array describing a single data point. label - The correct classification of the feature vector. current_theta - The current theta being used by the perceptron algorithm before this update. current_theta_0 - The current theta_0 being used by the perceptron algorithm before this update. Returns: A tuple where the first element is a numpy array with the value of theta after the current update has completed and the second element is a real valued number with the value of theta_0 after the current updated has completed. """ if label * (np.dot(current_theta, feature_vector) + current_theta_0) <= 0: current_theta += label * feature_vector current_theta_0 += label return (current_theta, current_theta_0)
perceptron_single_step_update(feature_vector,label,theta,theta_0)
(array([0, 3]), -0.5)
We need to make sure that if the output of the perceptron is $ 0$ , the weights are still updated, hence the check for large inequality. In fact, because of numerical instabilities, it is preferable to identify 0 with a small range $[−ε,ε]$ .
feature_vector = np.array([1, 2]) label, theta, theta_0 = 1, np.array([-1, 1]), -1.5 exp_res = (np.array([0, 3]), -0.5)
def perceptron_single_step_update(feature_vector, label, current_theta, current_theta_0): if label * (np.dot(current_theta, feature_vector) + current_theta_0) <= 1e-7: return (current_theta + label * feature_vector, current_theta_0 + label) return (current_theta, current_theta_0)
In this step you will implement the full perceptron algorithm.
This function should return a tuple in which the first element is the final value of $θ$ and the second element is the value of $θ_0$ .
Recall the algorithm steps:
Mathematical Definition: Perceptron Algorithm is stated as the following: Perceptron $\begin{equation}(\{(x^{(i)},y^{(i)}),i=1,...,n\},T):\end{equation}$ initialize $\begin{equation} θ=0,\hspace{0.3cm} θ_0=0\end{equation}$ (vector); for $\begin{equation} t=1,...,T \end{equation}$ do for $\begin{equation} i=1,...,n \end{equation}$ do if $\begin{equation} y^{(i)}(θ⋅x^{(i)})≤0 \end{equation}$ then update $\begin{equation} θ=θ+y^{(i)}x^{(i)}\end{equation}$
def perceptron(feature_matrix, labels, T): """ Runs the full perceptron algorithm on a given set of data. Runs T iterations through the data set, there is no need to worry about stopping early. Args: feature_matrix - A numpy matrix describing the given data. Each row represents a single data point. labels - A numpy array where the kth element of the array is the correct classification of the kth row of the feature matrix. T - An integer indicating how many times the perceptron algorithm should iterate through the feature matrix. Returns: A tuple where the first element is a numpy array with the value of theta, the linear classification parameter, after T iterations through the feature matrix and the second element is a real number with the value of theta_0, the offset classification parameter, after T iterations through the feature matrix. """ (nsamples, nfeatures) = feature_matrix.shape theta = np.zeros(nfeatures) theta_0 = 0.0 for t in range(T): for i in get_order(nsamples): theta, theta_0 = perceptron_single_step_update( feature_matrix[i], labels[i], theta, theta_0) return (theta, theta_0)
Using the ordering to iterate the feature matrix in each iteration. The ordering is specified due to testing purpose. In practice, people typically just randomly shuffle indices to do stochastic optimization.
import random def get_order(n_samples): try: with open(str(n_samples) + '.txt') as fp: line = fp.readline() return list(map(int, line.split(','))) except FileNotFoundError: random.seed(1) indices = list(range(n_samples)) random.shuffle(indices) return indices
Testing the Perceptron algorithm with random feature matrices, exp_res is the expected result, and our function should return the same values of theta and theta_0.
feature_matrix = np.array([[1, 2], [-1, 0]]) labels = np.array([1, 1]) T = 2 exp_res = (np.array([0, 2]), 2) perceptron(feature_matrix,labels,T)
(array([0., 2.]), 2.0)
The average perceptron will add a modification to the original perceptron algorithm: since the basic algorithm continues updating as the algorithm runs, nudging parameters in possibly conflicting directions, it is better to take an average of those parameters as the final answer. Every update of the algorithm is the same as before. The returned parameters $θ$ , however, are an average of the $θ $s across the $nT $ steps:
$θ_{final}={1\over nT}(θ^{(1)+}θ^{(2)+}...+θ^{(nT)})$
def average_perceptron(feature_matrix, labels, T): """ Runs the average perceptron algorithm on a given set of data. Runs T iterations through the data set, there is no need to worry about stopping early. Args: feature_matrix - A numpy matrix describing the given data. Each row represents a single data point. labels - A numpy array where the kth element of the array is the correct classification of the kth row of the feature matrix. T - An integer indicating how many times the perceptron algorithm should iterate through the feature matrix. Returns: A tuple where the first element is a numpy array with the value of the average theta, the linear classification parameter, found after T iterations through the feature matrix and the second element is a real number with the value of the average theta_0, the offset classification parameter, found after T iterations through the feature matrix. Hint: It is difficult to keep a running average; however, it is simple to find a sum and divide. """ (nsamples, nfeatures) = feature_matrix.shape theta = np.zeros(nfeatures) theta_sum = np.zeros(nfeatures) theta_0 = 0.0 theta_0_sum = 0.0 for t in range(T): for i in get_order(nsamples): theta, theta_0 = perceptron_single_step_update( feature_matrix[i], labels[i], theta, theta_0) theta_sum += theta theta_0_sum += theta_0 return (theta_sum / (nsamples * T), theta_0_sum / (nsamples * T))
Testing the function
feature_matrix = np.array([[1, 2], [-1, 0]]) labels = np.array([1, 1]) T = 2 exp_res = (np.array([-0.25, 1.5]), 1.75) average_perceptron(feature_matrix, labels, T)
(array([-0.25, 1.5 ]), 1.75)
The following pseudo-code describes the Pegasos update rule.
Pegasos update rule $(x^{(i)},y^{(i)},λ,η,θ)$: if $y^{(i)}(θ⋅x^{(i)})≤1$ then update $θ=(1−ηλ)θ+ηy^{(i)}x^{(i)}$ else: update $θ=(1−ηλ)θ$ The $η$ parameter is a decaying factor that will decrease over time. The $λ$ parameter is a regularizing parameter. In this problem, you will need to adapt this update rule to add a bias term ( $θ_0$ ) to the hypothesis, but take care not to penalize the magnitude of $θ_0$ .
The Pegasos algorithm mixes together a few good ideas:
When using a bias, the bias update for a mistake becomes:
$θ_0=θ_0+ηy^{(i)}$
Single step update for the Pegasos algorithm.
def pegasos_single_step_update( feature_vector, label, L, eta, current_theta, current_theta_0): """ Properly updates the classification parameter, theta and theta_0, on a single step of the Pegasos algorithm Args: feature_vector - A numpy array describing a single data point. label - The correct classification of the feature vector. L - The lamba value being used to update the parameters. eta - Learning rate to update parameters. current_theta - The current theta being used by the Pegasos algorithm before this update. current_theta_0 - The current theta_0 being used by the Pegasos algorithm before this update. Returns: A tuple where the first element is a numpy array with the value of theta after the current update has completed and the second element is a real valued number with the value of theta_0 after the current updated has completed. """ mult = 1 - (eta * L) if label * (np.dot(feature_vector, current_theta) + current_theta_0) <= 1: return ((mult * current_theta) + (eta * label * feature_vector), (current_theta_0) + (eta * label)) return (mult * current_theta, current_theta_0)
Testing the function based on dummy data, and the expected answer is in exp_res
feature_vector = np.array([1, 2]) label, theta, theta_0 = 1, np.array([-1, 1]), -1.5 L = 0.2 eta = 0.1 exp_res = (np.array([-0.88, 1.18]), -1.4) pegasos_single_step_update( feature_vector, label, L, eta, theta, theta_0)
(array([-0.88, 1.18]), -1.4)
Finally you will implement the full Pegasos algorithm.
def pegasos(feature_matrix, labels, T, L): """ Runs the Pegasos algorithm on a given set of data. Runs T iterations through the data set, there is no need to worry about stopping early. For each update, set learning rate = 1/sqrt(t), where t is a counter for the number of updates performed so far (between 1 and nT inclusive). Args: feature_matrix - A numpy matrix describing the given data. Each row represents a single data point. labels - A numpy array where the kth element of the array is the correct classification of the kth row of the feature matrix. T - An integer indicating how many times the algorithm should iterate through the feature matrix. L - The lamba value being used to update the Pegasos algorithm parameters. Returns: A tuple where the first element is a numpy array with the value of the theta, the linear classification parameter, found after T iterations through the feature matrix and the second element is a real number with the value of the theta_0, the offset classification parameter, found after T iterations through the feature matrix. """ (nsamples, nfeatures) = feature_matrix.shape theta = np.zeros(nfeatures) theta_0 = 0 count = 0 for t in range(T): for i in get_order(nsamples): count += 1 eta = 1.0 / np.sqrt(count) (theta, theta_0) = pegasos_single_step_update( feature_matrix[i], labels[i], L, eta, theta, theta_0) return (theta, theta_0)
(np.array([1-1/np.sqrt(2), 1-1/np.sqrt(2)]), 1)
(array([0.29289322, 0.29289322]), 1)
feature_matrix = np.array([[1, 1], [1, 1]]) labels = np.array([1, 1]) T = 1 L = 1 exp_res = (np.array([1-1/np.sqrt(2), 1-1/np.sqrt(2)]), 1) pegasos(feature_matrix, labels, T, L)
(array([0.29289322, 0.29289322]), 1.0)