Accelerators

GPU 사용하기

다음과 같이 사용하면 변수를 출력할때마다, 어느 장치에 할당이 되었는지 확인할 수 있다.

tf.debugging.set_log_device_placement(True)

with tf.device : 특정 장치에 연산을 수동으로 할당함.

tf.config.experimental.set_visible_devices : 텐서플로우는 기본적으로 장착되어 있는 모든 장치를 보고 사용한다. 때로는 내부단편화(한 프로세스에 너무 많은 메모리가 할당되어 남는 현상)을 막기 위해 특정 장치만 보게끔 하는 작업이 필요함. (아마 다른 작업이 돌아가고 있으면 GPU 사용할 메모리를 제한할 필요성이 있어보임.)

tf.config.experimental.set_memory_growth : 처음엔 조금만 할당, 갈수록 커지는 것을 허용함.
tf.config.experimental.set_virtual_device_configuration : 처음부터 최대 사용량을 결정함.

GPU 사용 코드

tf.debugging.set_log_device_placement(True)

try:
  # 유효하지 않은 GPU 장치를 명시
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)

Distributed Learning

분산처리 사용 코드

@tf.function
def train_step(dist_inputs):
  def step_fn(inputs):
    features, labels = inputs

    with tf.GradientTape() as tape:
      logits = model(features)
      cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
          logits=logits, labels=labels)
      loss = tf.reduce_sum(cross_entropy) * (1.0 / global_batch_size)

    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(list(zip(grads, model.trainable_variables)))
    return cross_entropy

  per_example_losses = mirrored_strategy.experimental_run_v2(
      step_fn, args=(dist_inputs,))
  mean_loss = mirrored_strategy.reduce(
      tf.distribute.ReduceOp.MEAN, per_example_losses, axis=0)
  return mean_loss

사용 전략

PreviousSave a model NextMobileNet

Last updated 4 years ago