tfutils.base

Entrance of tfutils

Check tfutils.train for function train_from_params.

Check tfutils.test for function test_from_params.

tfutils.train

tfutils.train.train(sess, dbinterface, train_loop, train_targets, global_step, num_minibatches=1, num_steps=inf, thres_loss=100, validate_first=True, validation_targets=None)[source]

Actually runs the training evaluation loop.

Parameters:
  • sess (tesorflow.Session) – Object in which to run calculations.
  • dbinterface (DBInterface object) – Saver through which to save results.
  • (callable withs args (train_loop) – sess and train_targets): Callable that specifies a custom training loop
  • train_targets (dict of tensorflow nodes) – Targets to train. One item in this dict must be “optimizer” or similar to make anything happen
  • num_minibatches (int) – How many minibatches to use to before applying gradient update.
  • num_steps (int) – How many steps to train to before quitting
  • (dict of tensorflow objects, default (validation_targets) – None): Objects on which validation will be computed
  • (float, default (thres_loss) – 100): If loss exceeds this during training, HiLossError is thrown
tfutils.train.train_from_params(save_params, model_params, train_params, loss_params=None, learning_rate_params=None, optimizer_params=None, validation_params=None, load_params=None, log_device_placement=False, dont_run=False, skip_check=False)[source]

Main training interface function.

Parameters:
  • save_params (dict) –

    Describing the parameters used to construct the save database, and control saving. These include:

    • host (str)
      Hostname where database connection lives
    • port (int)
      Port where database connection lives
    • dbname (str)
      Name of database for storage
    • collname (str)
      Name of collection for storage
    • exp_id (str)
      Experiment id descriptor NOTE: the variables host/port/dbname/coll/exp_id control the location of the saved data for the run, in order of increasing specificity. When choosing these, note that:
      • If a given host/port/dbname/coll/exp_id already has saved checkpoints, then any new call to start training with these same location variables will start to train from the most recent saved checkpoint. If you mistakenly try to start training a new model with different variable names, or structure, from that existing checkpoint, an error will be raised, as the model will be incompatiable with the saved variables.
      • When choosing what dbname, coll, and exp_id, to use, keep in mind that mongodb queries only operate over a single collection. So if you want to analyze results from a bunch of experiments together using mongod queries, you should put them all in the same collection, but with different exp_ids. If, on the other hand, you never expect to analyze data from two experiments together, you can put them in different collections or different databases. Choosing between putting two experiments in two collections in the same database or in two totally different databases will depend on how you want to organize your results and is really a matter of preference.
    • do_save (bool, default: True)
      Whether to save to database
    • save_initial_filters (bool, default: True)
      Whether to save initial model filters at step = 0,
    • save_metrics_freq (int, default: 5)
      How often to store train results to database
    • save_valid_freq (int, default: 3000)
      How often to calculate and store validation results to database
    • save_filters_freq (int, default: 30000)
      How often to save filter values to database
    • cache_filters_freq (int, default: 3000)
      How often to cache filter values locally and save to ___RECENT database
    • cache_max_num (int, default: 6)
      Maximal number of cached filters to keep in __RECENT database
    • cache_dir (str, default: None)
      Path where caches will be saved locally. If None, will default to ~/.tfutils/<host:post>/<dbname>/<collname>/<exp_id>.
  • model_params (dict) –

    Containing function that produces model and arguments to that function.

    • model_params[‘func’]
      The function producing the model.

      The function’s signature is:

      Args:

      • inputs: data object
      • train (boolean): if in training or testing
      • seed (int): seed for use in random generation

      Returns:

      • outputs (tf.Operations): train output tensorflow nodes
      • Additional configurations you want to store in database
    • Remaining items in model_params are dictionary of arguments passed to func.
  • train_params (dict) –

    Containing params for data sources and targets in training.

    • train_params[‘data_params’]
      This contains params for the data
      • train_params['data_params']['func'] is the function that constructs the data:
        The function’s signature is:

        Args:

        • batch_size: Batch size for input data

        Returns:

        • inputs: A dictionary of tensors that will be sent to model function
      • train_params['data_params']['batch_size'] batch size of the data, will be sent to func
      • Remainder of train_params['data_params'] are kwargs passed to func
    • train_params[‘targets’] (optional)
      contains params for additional train targets
      • train_params['targets']['func'] is a function that produces tensorflow nodes as training targets:
        The function’s signature is:

        Args:

        • inputs: returned values of train_params['data_params']['func']
        • output: first returned value of train_params['model_params']['func']

        Returns:

        A dictionary of tensors that will be computed and stored in the database

      • Remainder of train_parms['targets'] are arguments to func.
    • train_params[‘validate_first’] (optional, bool, default is True):
      controls whether validating before training
    • train_params[‘thres_loss’] (optional, float, default: 100):
      If loss exceeds this during training, HiLossError is thrown
    • train_params[‘num_steps’] (int or None, default: None):
      How many total steps of the optimization are run. If None, train is run until process is cancelled.
  • loss_params (dict) –

    Parameters for helper.get_loss_base function to build loss.

    • loss_params[‘pred_targets’] (a string or a list of strings):
      contain the names of inputs nodes that will be sent into the loss function
    • loss_params[‘loss_func’]:
      the function used to calculate the loss. Must be provided.
    • loss_params[‘loss_func_kwargs’] (dict):
      Keyword parameters sent to loss_params['loss_func']. Default is {}.
    • loss_params[‘agg_func’]:
      The aggregate function, default is None.
    • loss_params[‘agg_func_kwargs’]:
      Keyword parameters sent to loss_params['agg_func']. Default is {}.
    • loss_params[‘loss_per_case_func’] (Deprecated):
      Deprecated parameter, the same as loss_params['loss_func'].
    • loss_params[‘targets’] (Deprecated):
      Deprecated parameter, the same as loss_params['targets'].
  • learning_rate_params (dict) –

    Parameters for specifying learning_rate.

    • learning_rate_params[‘func’]:
      The function producing tensorflow node acting as learning rate. This function must accept argument global_step.
    • remainder of learning_rate_params are arguments to func.
  • optimizer_params (dict) –

    Parameters for creating optimizer.

    • optimizer_params[‘optimizer’]:
      A class producing an optimizer object, which should have function compute_gradients and apply_gradients. The signatures of these two functions are similar as tensorflow basic optimizer classes.

      Must accept:

      • ”learning_rate” – the result of the learning_rate_func call
      • Remainder of optimizer_params (aside form “optimizer”) are arguments to the optimizer func
    • optimizer_params[‘func’] (Deprecated):
      Deprecated parameter, the same as optimizer_params['optimizer'].
  • validation_params (dict) –

    Dictionary of validation sources. The structure if this dictionary is:

    {
    <validation_target_name_1>: {
    data: {
    ‘func’: (callable) data source function for this validation,

    <kwarg1>: <value1> for ‘func’,

    … },

    targets: {
    ‘func’: (callable) returning targets,

    <kwarg1>: <value1> for ‘func’,

    … },

    num_steps (int):
    number of batches of validation source to compute,
    agg_func (optional, callable):
    how to aggregate validation results across batches after computation. Signature is:
    • one input argument: the list of validation batch results
    • one output: aggregated version

    Default is utils.identity_func

    online_agg_func (optional, callable):
    how to aggregate validation results on a per-batch basis. Siganture is:
    • three input arguments: (current aggregate, new result, step)
    • one output: new aggregated result

    On first step, current aggregate passed in is None. The final result is passed to the “agg_func”. Default is utils.append_and_return

    },

    <validation_target_name_2>: …

    }

    For each validation_target_name key, the targets are computed and then added to the output dictionary to be computed every so often – unlike train_targets which are computed on each time step, these are computed on a basic controlled by the valid_save_freq specific in the save_params.

  • load_params (dict) –

    Similar to save_params, if you want loading to happen from a different location than where saving occurs. Parameters include:

    • host (str)
      Hostname where database connection lives
    • port (int)
      Port where database connection lives
    • dbname (str)
      Name of database for storage
    • collname (str)
      Name of collection for storage
    • exp_id (str)
      Experiment id descriptor
    • do_restore (bool, default: True)
      Whether to restore from saved model
    • query (dict)
      mongodb query describing how to load from loading database
    • from_ckpt (string)
      Path to load from a TensorFlow checkpoint (instead of from the db)
    • to_restore (list of strings or a regex/callable which returns strings)
      Specifies which variables should be loaded from the checkpoint. Any variables not specified here will be reinitialized.
    • load_param_dict (dict)
      A dictionary whose keys are the names of the variables that are to be loaded from the checkpoint, and the values are the names of the variables of the model that you want to restore with the value of the corresponding checkpoint variable.
  • log_device_placement (bool, default is False) – Advanced parameter. Whether to log device placement in tensorflow session
  • dont_run (bool, default is False) – Advanced parameter. Whether returning everything, not actually training
  • skip_check (bool, default is False) – Advanced parameter. Whether skipping github check, could be useful when working in detached head

tfutils.test

tfutils.test.test(sess, dbinterface, validation_targets, save_intermediate_freq=None)[source]

Actually runs the testing evaluation loop.

Parameters:
  • sess (tensorflow.Session) – Object in which to run calculations
  • dbinterface (DBInterface object) – Saver through which to save results
  • validation_targets (dict of tensorflow objects) – Objects on which validation will be computed.
  • save_intermediate_freq (None or int) – How frequently to save intermediate results captured during test None means no intermediate saving will be saved
Returns:

Validation summary. dict: Results.

Return type:

dict

tfutils.test.test_from_params(load_params, model_params, validation_params, log_device_placement=False, save_params=None, dont_run=False, skip_check=False)[source]

Main testing interface function.

Same as train_from_parameters; but just performs testing without training.

For documentation, see argument descriptions in train_from_params.

tfutils.model

Entrance of model building tools.

IMPORTANT thing to know: Tfutils does NOT require the usage of these tools at all! We put these tools here just to be used in: tutorials, function tests, and for users who previously used these tools

tfutils.optimizer

Default Optimizer to be used with tfutils.

The ClipOptimizer class adds support for gradient clipping, gradient aggregation across devices and gradient accumulation useful for performing minibatching (accumulating and aggregating gradients for multiple batches before applying a gradient update).

class tfutils.optimizer.ClipOptimizer(optimizer_class, clip=True, trainable_names=None, *optimizer_args, **optimizer_kwargs)[source]

Bases: object

A wrapper for general optimizers.

This class supports:

  • Clipping the gradients. (controlled by clip parameter)
  • Train part of trainable parameters (controlled by trainable_names)
Parameters:
  • optimizer_class – Returned value of this function should have compute_gradients and apply_gradients methods.
  • clip (bool, optional) – Default is True, clipping by [-1, 1].
  • trainable_names (list of strings, or string, optional) – Default is None. Scope names for variables to avoid training.
apply_gradients(grads_and_vars, global_step=None)[source]

Apply gradients to model variables specified in grads_and_vars.

apply_gradients returns an op that calls tf.train.Optimizer.apply_gradients

Parameters:
  • grads_and_vars (list) – Description.
  • global_step (None, optional) – tensorflow global_step variable.
Returns:

Applies gradient update to model followed by an

internal gradient zeroing operation to self.grads_and_vars.

Return type:

(tf.Operation)

compute_gradients(loss, *args, **kwargs)[source]

Compute gradients to model variables from loss.

Parameters:loss (tf.Tensor) – Tensorflow loss to optimize.
Returns:Compute gradient update to model followed by a clipping operation if self.clip is True.
Return type:(tf.Operation)
class tfutils.optimizer.MinibatchOptimizer(optimizer, *optimizer_args, **optimizer_kwargs)[source]

Bases: object

A wrapper used by tfutils for general optimizers.

This class supports:

  • Minibatch, only apply gradients after several steps. By default, apply gradients after each step
accumulate_gradients(minibatch_grads, num_minibatches=1)[source]

Accumulate gradients for num_minibatches minibatches.

classmethod aggregate_gradients(grads_and_vars, method='average')[source]
apply_gradients(grads_and_vars, global_step=None)[source]

Apply gradients to model variables specified in grads_and_vars.

apply_gradients returns an op that calls tf.train.Optimizer.apply_gradients.

Parameters:
  • grads_and_vars (list) – Description.
  • global_step (None, optional) – tensorflow global_step variable.
Returns:

Applies gradient update to model followed by an internal gradient zeroing operation to self.grads_and_vars.

Return type:

(tf.Operation)

classmethod average_gradients(tower_grads)[source]

Average a list of (grads, vars) produced by compute_gradients.

compute_gradients(loss, *args, **kwargs)[source]