计算两个矩阵之间的欧式距离-JobPlus

在我们使用k-NN模型时，需要计算测试集中每一点到训练集中每一点的欧氏距离，即需要求得两矩阵之间的欧氏距离。在实现k-NN算法时通常有三种方案，分别是使用两层循环，使用一层循环和不使用循环。

使用两层循环

分别对训练集和测试集中的数据进行循环遍历，计算每两个点之间的欧式距离，然后赋值给dist矩阵。此算法没有经过任何优化。

num_test = X.shape[0]

num_train = self.X_train.shape[0]

dists = np.zeros((num_test, num_train))

for i in xrange(num_test):

for j in xrange(num_train):

#####################################################################

# TODO: #

# Compute the l2 distance between the ith test point and the jth #

# training point, and store the result in dists[i, j]. You should #

# not use a loop over dimension. # #####################################################################

# pass dists[i][j] = np.sqrt(np.sum(np.square(X[i] - self.X_train[j])))

#####################################################################

# END OF YOUR CODE # #####################################################################

return dists

使用一层循环

使用矩阵表示训练集的数据，计算测试集中每一点到训练集矩阵的距离，可以对算法优化为只使用一层循环。

def compute_distances_one_loop(self, X):

"""

Compute the distance between each test point in X and each training point

in self.X_train using a single loop over the test data.

Input / Output: Same as compute_distances_two_loops

"""

num_test = X.shape[0]

num_train = self.X_train.shape[0]

dists = np.zeros((num_test, num_train))

for i in xrange(num_test):

#######################################################################

# TODO: #

# Compute the l2 distance between the ith test point and all training #

# points, and store the result in dists[i, :]. # #######################################################################

# pass

dists[i] = np.sqrt(np.sum(np.square(self.X_train - X[i]), axis = 1)) #######################################################################

# END OF YOUR CODE # #######################################################################

return dists

不使用循环

运算效率最高的算法是将训练集和测试集都使用矩阵表示，然后使用矩阵运算的方法替代之前的循环操作。但此操作需要我们对矩阵的运算规则非常熟悉。接下来着重记录如何计算两个矩阵之间的欧式距离。

记录测试集矩阵P的大小为M*D，训练集矩阵C的大小为N*D（测试集中共有M个点，每个点为D维特征向量。训练集中共有N个点，每个点为D维特征向量）
记PiPi是P的第i行，记CjCj是C的第j行

首先计算PiPi和CjCj之间的距离dist(i,j)

我们可以推广到距离矩阵的第i行的计算公式

继续将公式推广为整个距离矩阵

表示为python代码：

def compute_distances_no_loops(self, X):

"""

Compute the distance between each test point in X and each training point

in self.X_train using no explicit loops.

Input / Output: Same as compute_distances_two_loops

"""

num_test = X.shape[0]

num_train = self.X_train.shape[0]

dists = np.zeros((num_test, num_train))

#########################################################################

# TODO: #

# Compute the l2 distance between all test points and all training #

# points without using any explicit loops, and store the result in #

# dists. #

# #

# You should implement this function using only basic array operations; #

# in particular you should not use functions from scipy. #

# #

# HINT: Try to formulate the l2 distance using matrix multiplication #

# and two broadcast sums. # #########################################################################

# pass

dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)]))

#########################################################################

# END OF YOUR CODE # #########################################################################

return dists

在我们使用k-NN模型时，需要计算测试集中每一点到训练集中每一点的欧氏距离，即需要求得两矩阵之间的欧氏距离。在实现k-NN算法时通常有三种方案，分别是使用两层循环，使用一层循环和不使用循环。<h3>使用两层循环</h3>分别对训练集和测试集中的数据进行循环遍历，计算每两个点之间的欧式距离，然后赋值给dist矩阵。此算法没有经过任何优化。num_test = X.shape[0]    num_train = self.X_train.shape[0]    dists = np.zeros((num_test, num_train))    for i in xrange(num_test):      for j in xrange(num_train):        #####################################################################        # TODO:                                                             #        # Compute the l2 distance between the ith test point and the jth    #        # training point, and store the result in dists[i, j]. You should   #        # not use a loop over dimension.                                    #
        #####################################################################        # pass
        dists[i][j] = np.sqrt(np.sum(np.square(X[i] - self.X_train[j])))        #####################################################################        #                       END OF YOUR CODE                            #
        #####################################################################    return dists<ul><li> </li></ul><h3>使用一层循环</h3>使用矩阵表示训练集的数据，计算测试集中每一点到训练集矩阵的距离，可以对算法优化为只使用一层循环。def compute_distances_one_loop(self, X):    """    Compute the distance between each test point in X and each training point    in self.X_train using a single loop over the test data.    Input / Output: Same as compute_distances_two_loops    """    num_test = X.shape[0]    num_train = self.X_train.shape[0]    dists = np.zeros((num_test, num_train))    for i in xrange(num_test):      #######################################################################      # TODO:                                                               #      # Compute the l2 distance between the ith test point and all training #      # points, and store the result in dists[i, :].                        #
      #######################################################################      # pass      dists[i] = np.sqrt(np.sum(np.square(self.X_train - X[i]), axis = 1))      #######################################################################      #                         END OF YOUR CODE                            #
      #######################################################################    return dists<ul><li> </li></ul><h3>不使用循环</h3>运算效率最高的算法是将训练集和测试集都使用矩阵表示，然后使用矩阵运算的方法替代之前的循环操作。但此操作需要我们对矩阵的运算规则非常熟悉。接下来着重记录如何计算两个矩阵之间的欧式距离。记录测试集矩阵P的大小为M*D，训练集矩阵C的大小为N*D（测试集中共有M个点，每个点为D维特征向量。训练集中共有N个点，每个点为D维特征向量）  记PiPi是P的第i行，记CjCj是C的第j行  <img src="https://file.jobplus.com.cn/2018/05/10/c582773ca7144fceb61022a47f9f4b92.png" _src="https://file.jobplus.com.cn/2018/05/10/c582773ca7144fceb61022a47f9f4b92.png"/>首先计算PiPi和CjCj之间的距离dist(i,j)  <img src="https://file.jobplus.com.cn/2018/05/10/61501b61102f47f381009fe2f6017d62.png" _src="https://file.jobplus.com.cn/2018/05/10/61501b61102f47f381009fe2f6017d62.png"/>我们可以推广到距离矩阵的第i行的计算公式  <img src="https://file.jobplus.com.cn/2018/05/10/0a45291e66d54a1480b0cc84e85c3499.png" _src="https://file.jobplus.com.cn/2018/05/10/0a45291e66d54a1480b0cc84e85c3499.png"/>继续将公式推广为整个距离矩阵  <img src="https://file.jobplus.com.cn/2018/05/10/3801cadf92be40d4ae0ac299baca5bc3.png" _src="https://file.jobplus.com.cn/2018/05/10/3801cadf92be40d4ae0ac299baca5bc3.png"/>表示为python代码：def compute_distances_no_loops(self, X):    """    Compute the distance between each test point in X and each training point    in self.X_train using no explicit loops.    Input / Output: Same as compute_distances_two_loops    """    num_test = X.shape[0]    num_train = self.X_train.shape[0]    dists = np.zeros((num_test, num_train))    #########################################################################    # TODO:                                                                 #    # Compute the l2 distance between all test points and all training      #    # points without using any explicit loops, and store the result in      #    # dists.                                                                #    #                                                                       #    # You should implement this function using only basic array operations; #    # in particular you should not use functions from scipy.                #    #                                                                       #    # HINT: Try to formulate the l2 distance using matrix multiplication    #    #       and two broadcast sums.                                         #
    #########################################################################    # pass    dists = np.sqrt(-2*np.dot(X, self.X_train.T) + np.sum(np.square(self.X_train), axis = 1) + np.transpose([np.sum(np.square(X), axis = 1)]))    #########################################################################    #                         END OF YOUR CODE                              #
    #########################################################################    return dists<ul><li> </li></ul>