I'm working through the Pydaal API but can't figure out how to serialize and deserialize a trained (SVM) model; every example script trains and tests the model in one shot, but I need to serialize it for later to predict on new data as it becomes available. How can I do this through the Python API?
You can use the Serialization and Deserialization interfaces available in DAAL to serialize model results into buffer(or even save to disk) and de-serialize at a later point to reconstruct the trained model. Note that, while de-serializing, an empty object must be constructed with the same type of serialized object.
Below code snippet is an extended functionality to the existing SVM code. Also, I have attached the complete code for your reference
import numpy as np from daal.data_management import InputDataArchive, OutputDataArchive def Serialize(model): # Construct input data archive Object # Serialize model contents into data archive Object # Copy data archive contents to numpy array dataArch = InputDataArchive () model.serialize (dataArch) length = dataArch.getSizeOfArchive () buffer_array = np.zeros (length, dtype=np.ubyte) dataArch.copyArchiveToArray (buffer_array) return buffer_array if __name__ == "__main__": trainingResult = trainModel() buffer = Serialize(trainingResult) # you can save this serialized object (which is a numpy array)to your disk #np.save(path, buffer) #Deserialization starts here #buffer = np.load(path) #Load numpy array dataArch = OutputDataArchive (buffer) trainedModel = training.Result() # Construct an empty training result object trainedModel.deserialize (dataArch) # Deserialize into trainedModel testModel(trainedModel)