The problem of saving and reading bigdl.Module

clare_cn · ‎11-28-2024

I am currently using Bigdl Module Build Model. However, I encountered a problem that may have been caused by BatchNormalization.

I used the following code to construct the model:

def createMixModel(userInputDim: Int, itemInputDim: Int, shareDims: Array[Int], dcnCrossLayers: Int, hiddenDims: Array[Int]): Module[Float] = {
  val l2Regularizer: Regularizer[Float] = L2Regularizer(0.05)
  val historyInput = Input[Float](inputShape = Shape(3, 4))
  val userInput = Input[Float](inputShape = Shape(userInputDim))
  val itemInput = Input[Float](inputShape = Shape(itemInputDim))

  // DIN
  val expandedUserHistory = TimeDistributed[Float](Dense[Float](itemInputDim).asInstanceOf[KerasLayer[Activity, Tensor[Float], Float]]).inputs(historyInput)
  val expandedItem = RepeatVector(3).inputs(itemInput)
  val attentionScores = TimeDistributed[Float](Merge[Float](mode = "dot").asInstanceOf[KerasLayer[Activity, Tensor[Float], Float]]).inputs(expandedUserHistory, expandedItem)
  val expandedAttentionScores = Reshape(Array(3, itemInputDim)).inputs(TimeDistributed[Float](RepeatVector[Float](itemInputDim).asInstanceOf[KerasLayer[Activity, Tensor[Float], Float]]).inputs(attentionScores))
  val weightedUserHistory = Merge[Float](mode = "mul").inputs(expandedUserHistory, expandedAttentionScores)
  val dinOutput = GlobalAveragePooling1D[Float]().inputs(weightedUserHistory)

  // DCN
  var userItemInput = BatchNormalization[Float]().inputs(Merge[Float](mode = "concat").inputs(itemInput, userInput))
  for (dim <- shareDims) {
    userItemInput = Activation[Float]("relu").inputs(Dense[Float](dim).inputs(userItemInput))
  }

  var deepLayer = userItemInput
  for (dim <- hiddenDims) {
    deepLayer = Activation[Float]("relu").inputs(Dense[Float](dim).inputs(deepLayer))
  }

  var crossInput = userItemInput
  val x0 = userItemInput
  for (_ <- 1 to dcnCrossLayers) {
    val dotProduct = Merge[Float](mode = "mul").inputs(crossInput, x0)
    val linear = Dense[Float](shareDims.last, bias = false).inputs(dotProduct)
    val added = Merge[Float](mode = "sum").inputs(linear, crossInput)
    crossInput = added
  }

  val dcnOutput = Merge[Float](mode = "concat").inputs(deepLayer, crossInput)

  // ESMM
  var crInput = BatchNormalization[Float]().inputs(Merge[Float](mode = "concat").inputs(dinOutput, dcnOutput))
  for (dim <- shareDims) {
    crInput = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim).inputs(crInput))
  }

  var ctrLayer = crInput
  for (dim <- hiddenDims) {
    ctrLayer = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim, wRegularizer = l2Regularizer).inputs(ctrLayer))
  }
  val ctrOutput = Activation[Float]("sigmoid").inputs(Dense[Float](1).inputs(ctrLayer))

  var cvrLayer = crInput
  for (dim <- hiddenDims) {
    cvrLayer = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim, wRegularizer = l2Regularizer).inputs(cvrLayer))
  }
  val cvrOutput = Activation[Float]("sigmoid").inputs(Dense[Float](1).inputs(cvrLayer))

  var rtiLayer = crInput
  for (dim <- hiddenDims) {
    rtiLayer = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim, wRegularizer = l2Regularizer).inputs(rtiLayer))
  }
  val rtiOutput = Activation[Float]("sigmoid").inputs(Dense[Float](1).inputs(rtiLayer))

  val model = Model[Float](input = Array(historyInput, userInput, itemInput), output = Array(ctrOutput, cvrOutput, rtiOutput))
  model
}

Then train the model:

val optimizer = Optimizer(
  model = model,
  dataset = trainData,
  criterion = criterion
).setOptimMethod(new Adam[Float]())
  .setEndWhen(Trigger.or(Trigger.maxEpoch(maxEpoch), Trigger.minLoss(minLoss)))
val trainedModel = optimizer.optimize()

If I use the trainedModel directly for prediction, the prediction results look normal
But if I save and reload the model, the predicted results will always be 1.0.

trainedModel.saveModel(path = s"hdfs://xxx", weightPath = null, overWrite = true)

val model = Module.loadModule[Float](s"hdfs://xxx")

I suspect it's a BatchNormalization issue because I've tried removing it and it's normal, but the prediction results are not very good.

May I ask what specific problem this is? Thank you very much!

clare_cn · ‎11-30-2024

The problem has been resolved. The dinOutput issue caused a gradient explosion, and runningVar resulted in Infinity
A layer of BatchNormalization on the original dinOutput solves the problem
But what is puzzling is why this problem is not thrown out when using the trained model for prediction directly, which led me to always think it was a problem of model saving and loading

Ying_H_Intel · ‎12-01-2024

Hi,

thank you a lot for report the issue here and catch the . As BigDL were running as opensource ， please feel free to submit the issue there Issues · intel-analytics/ipex-llm， and BigDL developer team will work there

thanks