具有L2损失的正则化，如何应用于所有权重，而不仅仅是最后一个？

0 1282

我正在学习ANN，它是Udacity DeepLearning课程的一部分。

我的一项作业涉及使用L2损失将泛化引入具有一个隐藏ReLU层的网络。我想知道如何正确地引入它，以便所有权重都受到惩罚，而不仅仅是输出层的权重。没有泛化的网络代码在下面。

引入L2的一种明显方法是将损失计算替换为以下内容（如果beta为0.01）：

loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels) + 0.01*tf.nn.l2_loss(out_weights))

但是在这种情况下，它将考虑输出层权重的值。我们如何适当地惩罚进入隐藏的ReLU层的权重，还是对输出层进行惩罚会以某种方式控制隐藏权重？

#some importingfrom __future__ import print_functionimport numpy as npimport tensorflow as tffrom six.moves import cPickle as picklefrom six.moves import range
#loading data
pickle_file = '/home/maxkhk/Documents/Udacity/DeepLearningCourse/SourceCode/tensorflow/examples/udacity/notMNIST.pickle'
with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

#prepare data to have right format for tensorflow#i.e. data is flat matrix, labels are onehot

image_size = 28
num_labels = 10
def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

#now is the interesting part - we are building a network with#one hidden ReLU layer and out usual output linear layer
#we are going to use SGD so here is our size of batch
batch_size = 128
#building tensorflow graph
graph = tf.Graph()with graph.as_default():
      # Input data. For the training data, we use a placeholder that will be fed
  # at run time with a training minibatch.
  tf_train_dataset = tf.placeholder(tf.float32,
                                    shape=(batch_size, image_size * image_size))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  #now let's build our new hidden layer
  #that's how many hidden neurons we want
  num_hidden_neurons = 1024
  #its weights
  hidden_weights = tf.Variable(
    tf.truncated_normal([image_size * image_size, num_hidden_neurons]))
  hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))

  #now the layer itself. It multiplies data by weights, adds biases
  #and takes ReLU over result
  hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)

  #time to go for output linear layer
  #out weights connect hidden neurons to output labels
  #biases are added to output labels  
  out_weights = tf.Variable(
    tf.truncated_normal([num_hidden_neurons, num_labels]))  

  out_biases = tf.Variable(tf.zeros([num_labels]))  

  #compute output  
  out_layer = tf.matmul(hidden_layer,out_weights) + out_biases
  #our real output is a softmax of prior result
  #and we also compute its cross-entropy to get our loss
  loss = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(out_layer, tf_train_labels))

  #now we just minimize this loss to actually train the network
  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

  #nice, now let's calculate the predictions on each dataset for evaluating the
  #performance so far
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(out_layer)
  valid_relu = tf.nn.relu(  tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)
  valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases) 

  test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)
  test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)

2021-02-06 10:55 更新

karry • 4554

共 1 个回答

高赞时间

一种更短且可扩展的方法是：

vars   = tf.trainable_variables() 
lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars ]) * 0.001

这基本上是对所有可训练变量的l2_loss求和。你也可以制作一个词典，在其中仅指定要添加到成本中的变量，然后使用上面的第二行。然后，可以将lossL2与softmax交叉熵值相加，以计算总损耗。

上面的代码还将规范偏差。可以通过在第二行中添加if语句来避免这种情况；

lossL2 = tf.add_n([ tf.nn.l2_loss(v) for v in vars
                    if 'bias' not in v.name ]) * 0.001

这可以用来排除其他变量。

Via:https://stackoverflow.com/a/38466108/14964791

2021-02-06 11:22 更新

anna • 5052

·圈子

位酷友已加入

圈子：计算机

标签：

计算机算法人工智能

邀请回答

邀请

邀请

邀请

邀请

加入组织

微信扫码，每周推送最新资料

理工酷

首页

圈子

资源下载

邀请回答

推荐问题

推荐资源

加入组织

理工酷

首页

圈子

资源下载

站外资源

问答

网址导航

邀请回答 换一组

推荐问题

推荐资源

加入组织

邀请回答