- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
%%writefile test_torch.py
#!/usr/bin/env python
# encoding: utf-8
'''
==============================================================
Copyright © 2019 Intel Corporation
SPDX-License-Identifier: MIT
==============================================================
'''
import torch
import torch.nn as nn_
from torch.utils.data import Dataset, DataLoader
import intel_extension_for_pytorch as ipex
print(torch.nn.__file__)
'''
BS_TRAIN: Batch size for training data
BS_TEST: Batch size for testing data
EPOCHNUM: Number of epoch for training
'''
BS_TRAIN = 50
BS_TEST = 10
EPOCHNUM = 1
'''
TestDataset class is inherited from torch.utils.data.Dataset.
Since data for training involves data and ground truth, a flag "train" is defined in the initialization function. When train is True, instance of TestDataset gets a pair of training data and label data. When it is False, the instance gets data only for inference. Value of the flag "train" is set in __init__ function.
In __getitem__ function, data at index position is supposed to be returned.
__len__ function returns the overall length of the dataset.
'''
class TestDataset(Dataset):
def __init__(self, train = True):
super(TestDataset, self).__init__()
self.train = train
def __getitem__(self, index):
if self.train:
return torch.rand(3, 112, 112), torch.rand(6, 110, 110)
else:
return torch.rand(3, 112, 112)
def __len__(self):
if self.train:
return 100
else:
return 20
'''
TestModel class is inherited from torch.nn.Module.
Operations that will be used in the topology are defined in __init__ function.
Input data x is supposed to be passed to the forward function. The topology is implemented in the forward function. When perform training/inference, the forward function will be called automatically by passing input data to a model instance.
'''
class TestModel(nn_.Module):
def __init__(self):
super(TestModel, self).__init__()
self.conv = nn_.Conv2d(3, 6, 3)
self.norm = nn_.BatchNorm2d(6)
self.relu = nn_.ReLU()
def forward(self, x):
x = self.conv(x)
x = self.norm(x)
x = self.relu(x)
return x
'''
Perform training and inference in main function
'''
def main():
#torch.autograd.set_detect_anomaly(True)
'''
The following 3 components are required to perform training.
1. model: Instantiate model class
2. optim: Optimization function for update topology parameters during training
3. crite: Criterion function to minimize loss
'''
model = TestModel()
optim = torch.optim.SGD(model.parameters(), lr=0.01)
crite = lambda x, y:(x-y)
'''
1. Instantiate the Dataset class defined before
2. Use torch.utils.data.DataLoader to load data from the Dataset instance
'''
train_data = TestDataset()
trainLoader = DataLoader(train_data, batch_size=BS_TRAIN)
test_data = TestDataset(train=False)
testLoader = DataLoader(test_data, batch_size=BS_TEST)
'''
Apply Intel Extension for PyTorch optimization against the model object and optimizer object.
'''
#model, optim = ipex.optimize(model, optimizer=optim_)
'''
Perform training and inference
Use model.train() to set the model into train mode. Use model.eval() to set the model into inference mode.
Use for loop with enumerate(instance of DataLoader) to go through the whole dataset for training/inference.
'''
for i in range(0, EPOCHNUM):
'''
Iterate dataset for training to train the model
'''
model.train()
for batch_index, (data, y_ans) in enumerate(trainLoader):
data = data.to(memory_format=torch.channels_last)
optim.zero_grad()
y = model(data)
loss = crite(y, y_ans).sum()
print(loss)
loss.backward()
optim.step()
'''
Iterate dataset for validation to evaluate the model
'''
model.eval()
for batch_index, data in enumerate(testLoader):
data = data.to(memory_format=torch.channels_last)
y = model(data)
if __name__ == '__main__':
main()
print('[CODE_SAMPLE_COMPLETED_SUCCESFULLY]')
I tried to run the code above
but got segmentation fault when doing backward
how can this be fixed or is this a bug
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried this in my own ubuntu and got the correct output.
this seems to only appear in the dev cloud environment.
this has really troubled me for days
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
We are able to run your code in Intel DevCloud for oneAPI.
Please make sure that you are running the code in a compute node.
Please follow the below steps to run the code without any error:
#Step 1 - Login to compute node
qsub -I
#Step 2 - Create a conda environment
conda create -n <env_name> python==3.9
#Step 3 - Activate conda environment
conda activate <env_name>
#Step 4 - Install PyTorch and Intel Extension For PyTorch
python -m pip install torch==1.13.0a0+git6c9b55e intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
#Step 5 - Run the code
python code.py
If this resolves your issue, please accept this as a solution. It would help other users with a similar issues.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please give us an update?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page