- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am currently using Bigdl Module Build Model. However, I encountered a problem that may have been caused by BatchNormalization.
I used the following code to construct the model:
def createMixModel(userInputDim: Int, itemInputDim: Int, shareDims: Array[Int], dcnCrossLayers: Int, hiddenDims: Array[Int]): Module[Float] = {
val l2Regularizer: Regularizer[Float] = L2Regularizer(0.05)
val historyInput = Input[Float](inputShape = Shape(3, 4))
val userInput = Input[Float](inputShape = Shape(userInputDim))
val itemInput = Input[Float](inputShape = Shape(itemInputDim))
// DIN
val expandedUserHistory = TimeDistributed[Float](Dense[Float](itemInputDim).asInstanceOf[KerasLayer[Activity, Tensor[Float], Float]]).inputs(historyInput)
val expandedItem = RepeatVector(3).inputs(itemInput)
val attentionScores = TimeDistributed[Float](Merge[Float](mode = "dot").asInstanceOf[KerasLayer[Activity, Tensor[Float], Float]]).inputs(expandedUserHistory, expandedItem)
val expandedAttentionScores = Reshape(Array(3, itemInputDim)).inputs(TimeDistributed[Float](RepeatVector[Float](itemInputDim).asInstanceOf[KerasLayer[Activity, Tensor[Float], Float]]).inputs(attentionScores))
val weightedUserHistory = Merge[Float](mode = "mul").inputs(expandedUserHistory, expandedAttentionScores)
val dinOutput = GlobalAveragePooling1D[Float]().inputs(weightedUserHistory)
// DCN
var userItemInput = BatchNormalization[Float]().inputs(Merge[Float](mode = "concat").inputs(itemInput, userInput))
for (dim <- shareDims) {
userItemInput = Activation[Float]("relu").inputs(Dense[Float](dim).inputs(userItemInput))
}
var deepLayer = userItemInput
for (dim <- hiddenDims) {
deepLayer = Activation[Float]("relu").inputs(Dense[Float](dim).inputs(deepLayer))
}
var crossInput = userItemInput
val x0 = userItemInput
for (_ <- 1 to dcnCrossLayers) {
val dotProduct = Merge[Float](mode = "mul").inputs(crossInput, x0)
val linear = Dense[Float](shareDims.last, bias = false).inputs(dotProduct)
val added = Merge[Float](mode = "sum").inputs(linear, crossInput)
crossInput = added
}
val dcnOutput = Merge[Float](mode = "concat").inputs(deepLayer, crossInput)
// ESMM
var crInput = BatchNormalization[Float]().inputs(Merge[Float](mode = "concat").inputs(dinOutput, dcnOutput))
for (dim <- shareDims) {
crInput = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim).inputs(crInput))
}
var ctrLayer = crInput
for (dim <- hiddenDims) {
ctrLayer = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim, wRegularizer = l2Regularizer).inputs(ctrLayer))
}
val ctrOutput = Activation[Float]("sigmoid").inputs(Dense[Float](1).inputs(ctrLayer))
var cvrLayer = crInput
for (dim <- hiddenDims) {
cvrLayer = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim, wRegularizer = l2Regularizer).inputs(cvrLayer))
}
val cvrOutput = Activation[Float]("sigmoid").inputs(Dense[Float](1).inputs(cvrLayer))
var rtiLayer = crInput
for (dim <- hiddenDims) {
rtiLayer = Activation[Float]("relu").inputs(Dense[Float](outputDim = dim, wRegularizer = l2Regularizer).inputs(rtiLayer))
}
val rtiOutput = Activation[Float]("sigmoid").inputs(Dense[Float](1).inputs(rtiLayer))
val model = Model[Float](input = Array(historyInput, userInput, itemInput), output = Array(ctrOutput, cvrOutput, rtiOutput))
model
}
Then train the model:
val optimizer = Optimizer(
model = model,
dataset = trainData,
criterion = criterion
).setOptimMethod(new Adam[Float]())
.setEndWhen(Trigger.or(Trigger.maxEpoch(maxEpoch), Trigger.minLoss(minLoss)))
val trainedModel = optimizer.optimize()
If I use the trainedModel directly for prediction, the prediction results look normal
But if I save and reload the model, the predicted results will always be 1.0.
trainedModel.saveModel(path = s"hdfs://xxx", weightPath = null, overWrite = true)
val model = Module.loadModule[Float](s"hdfs://xxx")
I suspect it's a BatchNormalization issue because I've tried removing it and it's normal, but the prediction results are not very good.
May I ask what specific problem this is? Thank you very much!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem has been resolved. The dinOutput issue caused a gradient explosion, and runningVar resulted in Infinity
A layer of BatchNormalization on the original dinOutput solves the problem
But what is puzzling is why this problem is not thrown out when using the trained model for prediction directly, which led me to always think it was a problem of model saving and loading
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thank you a lot for report the issue here and catch the . As BigDL were running as opensource , please feel free to submit the issue there Issues · intel-analytics/ipex-llm, and BigDL developer team will work there
thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page