Solved: JAVA interface of neural network is giving NaN values

Mayank_Jindal · ‎06-28-2017

When I am adding number of hidden layers or changing number of neurons in last layer in
https://github.com/01org/daal/blob/daal_2018_beta_update1/examples/java/com/intel/daal/examples/neural_networks/NeuralNetConfiguratorDistr.java

Then I am getting NaN values.
I am running this java file -
https://github.com/01org/daal/blob/daal_2018_beta_update1/examples/java/com/intel/daal/examples/neural_networks/NeuralNetDenseDistr.java

I am testing neural network on MNIST data which contains 10 labels. So I am changing neurons to 10 in line 60 of NeuralNetConfiguratorDistr.java

VictoriyaS_F_Intel · ‎07-03-2017

Per our analysis, there is a bug in the distributed training of a neural network in the present version of Intel DAAL: weights and biases are not properly initialized in the beginning of the computations.
We plan to fix it in the future versions of the library.

The workaround for this bug is below: add the piece of the code

if (i == 0) {
    /* Retrieve training model of the neural network on master node */
    TrainingModel trainingModelOnMaster = net.getResult().get(TrainingResultId.model);
    /* Retrieve training model of the neural network on local node */
    TrainingModel trainingModelOnLocal  = netLocal[0].input.get(DistributedStep1LocalInputId.inputModel);

    /* Set weights and biases on master node using the weights and biases from local node */
    trainingModelOnMaster.setWeightsAndBiases(trainingModelOnLocal.getWeightsAndBiases());

    /* Set initialization flag parameter as true in all forward layers of the training model on master node */
    ForwardLayers forwardLayers = trainingModelOnMaster.getForwardLayers();
    for (int j = 0; j < forwardLayers.size(); j++) {
        forwardLayers.get(j).getLayerParameter().setWeightsAndBiasesInitializationFlag(true);
    }
}

into the example NeuralNetDenseDistr.java, line 156.

View solution in original post

VictoriyaS_F_Intel · ‎06-28-2017

Hello Mayank J,

I assume this is the same issue that was reported recently on Intel® DAAL GitHub (https://github.com/01org/daal/issues/18).

We are running the analysis of the issue on our side. Please give us several days and we will get back to you with the results.

Best regards,

Victoriya

VictoriyaS_F_Intel · ‎07-03-2017

Per our analysis, there is a bug in the distributed training of a neural network in the present version of Intel DAAL: weights and biases are not properly initialized in the beginning of the computations.
We plan to fix it in the future versions of the library.

The workaround for this bug is below: add the piece of the code

if (i == 0) {
    /* Retrieve training model of the neural network on master node */
    TrainingModel trainingModelOnMaster = net.getResult().get(TrainingResultId.model);
    /* Retrieve training model of the neural network on local node */
    TrainingModel trainingModelOnLocal  = netLocal[0].input.get(DistributedStep1LocalInputId.inputModel);

    /* Set weights and biases on master node using the weights and biases from local node */
    trainingModelOnMaster.setWeightsAndBiases(trainingModelOnLocal.getWeightsAndBiases());

    /* Set initialization flag parameter as true in all forward layers of the training model on master node */
    ForwardLayers forwardLayers = trainingModelOnMaster.getForwardLayers();
    for (int j = 0; j < forwardLayers.size(); j++) {
        forwardLayers.get(j).getLayerParameter().setWeightsAndBiasesInitializationFlag(true);
    }
}

into the example NeuralNetDenseDistr.java, line 156.

Mayank_Jindal · ‎07-03-2017

Thanks, It is working now.

Mayank_Jindal · ‎07-05-2017

@Victoriya Have you tried with increasing number of hidden layer?

I am getting NaN values when I am adding a hidden layer.

I am attaching code of my NeuralNetConfiguratorDistr.java file -

/* file: NeuralNetConfiguratorDistr.java */
/*******************************************************************************
* Copyright 2014-2017 Intel Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*******************************************************************************/

/*
 //  Content:
 //     Java example of neural network configurator
 ////////////////////////////////////////////////////////////////////////////////
 */

package com.intel.daal.examples.neural_networks;

import com.intel.daal.algorithms.neural_networks.*;
import com.intel.daal.algorithms.neural_networks.initializers.uniform.*;
import com.intel.daal.algorithms.neural_networks.training.TrainingTopology;
import com.intel.daal.algorithms.neural_networks.layers.fullyconnected.*;
import com.intel.daal.algorithms.neural_networks.layers.softmax_cross.*;
import com.intel.daal.algorithms.neural_networks.layers.LayerDescriptor;
import com.intel.daal.algorithms.neural_networks.layers.NextLayers;
import com.intel.daal.algorithms.neural_networks.layers.ForwardLayer;
import com.intel.daal.algorithms.neural_networks.layers.BackwardLayer;
import com.intel.daal.examples.utils.Service;
import com.intel.daal.services.DaalContext;

/**
 * <a name="DAAL-EXAMPLE-JAVA-NEURALNETWORKCONFIGURATORDISTR">
 * @example NeuralNetConfiguratorDistr.java
 */
class NeuralNetConfiguratorDistr {
    public static TrainingTopology configureNet(DaalContext context) {
        /* Create layers of the neural network */
        /* Create fully-connected layer and initialize layer parameters */
        FullyConnectedBatch fullyconnectedLayer1 = new FullyConnectedBatch(context, Float.class, FullyConnectedMethod.defaultDense, 20);

        fullyconnectedLayer1.parameter.setWeightsInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, -0.001, 0.001));

        fullyconnectedLayer1.parameter.setBiasesInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, 0, 0.5));

        /* Create fully-connected layer and initialize layer parameters */
        FullyConnectedBatch fullyconnectedLayer2 = new FullyConnectedBatch(context, Float.class, FullyConnectedMethod.defaultDense, 40);

        fullyconnectedLayer2.parameter.setWeightsInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, 0.5, 1));

        fullyconnectedLayer2.parameter.setBiasesInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, 0.5, 1));


        FullyConnectedBatch fullyconnectedLayerTest = new FullyConnectedBatch(context, Float.class, FullyConnectedMethod.defaultDense, 40);

        fullyconnectedLayerTest.parameter.setWeightsInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, 0.5, 1));

        fullyconnectedLayerTest.parameter.setBiasesInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, 0.5, 1));


        /* Create fully-connected layer and initialize layer parameters */
        FullyConnectedBatch fullyconnectedLayer3 = new FullyConnectedBatch(context, Float.class, FullyConnectedMethod.defaultDense, 2);

        fullyconnectedLayer3.parameter.setWeightsInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, -0.005, 0.005));

        fullyconnectedLayer3.parameter.setBiasesInitializer(new UniformBatch(context, Float.class, UniformMethod.defaultDense, 0, 1));

        /* Create softmax cross-entropy loss layer and initialize layer parameters */
        SoftmaxCrossBatch softmaxCrossEntropyLayer = new SoftmaxCrossBatch(context, Float.class, SoftmaxCrossMethod.defaultDense);

        /* Create topology of the neural network */
        TrainingTopology topology = new TrainingTopology(context);

        /* Add layers to the topology of the neural network */
        long fc1 = topology.add(fullyconnectedLayer1);
        long fc2 = topology.add(fullyconnectedLayer2);
        long fcTest = topology.add(fullyconnectedLayerTest);
        long fc3 = topology.add(fullyconnectedLayer3);
        long sm = topology.add(softmaxCrossEntropyLayer);
        topology.addNext(fc1, fc2);
        topology.addNext(fc2, fcTest);
        topology.addNext(fcTest, fc3);
        topology.addNext(fc3, sm);
        return topology;
    }
}

VictoriyaS_F_Intel · ‎07-06-2017

I have not yet tried to increase the number of hidden layers in example.

Please give me couple of days to run the analysis with your code. After that the results of the analysis will be provided to you.

VictoriyaS_F_Intel · ‎07-06-2017

The investigation shown that SGD algorithm that is used on the training stage starts to diverge in the example after adding one more hidden layer. You can check it yourself by printing the numeric table of weights and biases (wb) on each iteration. The weights and biases grow very fast and eventually become NaNs. This unlimited growth of weights and biases shows that the SGD algorithm diverges.

There are several options available to make the optimization solver converge:

Make the learning rate of the SGD smaller. Learning rate 0.00001 should work fine.
Use another optimization solver. Try AdaGrad or other methods of SGD: mini-batch and momentum.

Best regards,

Victoriya

Mayank_Jindal · ‎07-06-2017

@Victoria

I was already checking the numeric table on each iteration. But I was not sure that it is happening due to optimization solver.

Thank you very much. It is working now.

Gennady_F_Intel · ‎10-10-2017

fyi - the fix of the problem available into DAAL v.2018 ( released at the middle of September 2017)