Skip to content
Snippets Groups Projects
Commit d06f4e9f authored by Kilian Pfeiffer's avatar Kilian Pfeiffer :grinning:
Browse files

added quantization lab

parent 1230ece6
No related branches found
No related tags found
No related merge requests found
File added
%% Cell type:markdown id:diverse-deadline tags:
# Embedded ML Lab - Excercise 2- Quantization
In this exercise, we will explore quantization with PyTorch. Quantization in PyTorch is fairly new introduced, consequently not all APIs are already stable, and not all operations are already supported. Also, quantization right now only works on the cpu, not on the gpu.
We will do the following things in the quantization exercise.
* First, we will play around with quantized tensors, and very common patterns in PyTorch we require later
* In this lab, the unquantized network is already given, but we require a quantized version of the network
* To evaluate the benefits of quantization we require two things:
* First, we are interested in the execution time of the quantized network and the speed improvements
* Second, we are interested in how much accuracy we lose through quantization
%% Cell type:markdown id:incoming-accident tags:
For this lab, the network is already implemented in `net.py`, take a look at it.
%% Cell type:code id:acceptable-background tags:
``` python
from net import CifarNet
```
%% Cell type:code id:returning-parade tags:
``` python
net = CifarNet()
```
%% Cell type:markdown id:hundred-sitting tags:
This time we will work with a slightly more complex dataset, called CIFAR (CIFAR10) https://www.cs.toronto.edu/~kriz/cifar.html where each sample contains 3 input channels (RGB), and 32 by 32 pixels. There are 10 different types of classes, namely:
%% Cell type:code id:continental-bermuda tags:
``` python
import torch
import torchvision
import torchvision.transforms as transforms
tf = transforms.Compose([transforms.ToTensor()])
trainloader = torch.utils.data.DataLoader(torchvision.datasets.CIFAR10('data/', train=True, download=True, transform=tf), batch_size=64, shuffle=False)
print(f"Classes in CIFAR10 {trainloader.dataset.classes}")
```
%% Cell type:code id:treated-discharge tags:
``` python
_, (cifar_samples, targets) = next(enumerate(trainloader))
import matplotlib.pyplot as plt
fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(12, 6),
subplot_kw={'xticks': [], 'yticks': []}, constrained_layout=True)
for idx, ax in enumerate(axs.flat):
ax.imshow(cifar_samples[idx,:,:,:].permute(1,2,0).numpy()*0.5 + 0.5, interpolation='spline16')
ax.set_title(trainloader.dataset.classes[targets[idx]])
plt.show()
```
%% Cell type:markdown id:editorial-trade tags:
**Now we focus on quantized operators:**
A normal float tensor can be quantized using `torch.quantize_per_tensor(input, scale, zero_point, dtype)` where we have to define a *scale*, a *zero_point*, and a datatpye.
Currently, PyTorch requires all `weights` to be of datatype `torch.qint8`, therefore we use a zero point of `0`, to have the same resultion in positive and negative range.
Activations are of type `torch.quint8`. It is recommended to set the scale for these to `64` to have one bit as safety margin for overflows.
A tensor can very easily transfered back to a float tensor using `torch.dequantize()`
%% Cell type:code id:nervous-stomach tags:
``` python
rand_values = torch.rand(3,3)
print(f"float tensor\n{rand_values}")
qvalues = torch.quantize_per_tensor(rand_values, scale=0.1, zero_point=64, dtype=torch.quint8)
print(f"quantized tensor (uint8)\n{qvalues}")
res = torch.dequantize(qvalues)
```
%% Cell type:markdown id:interstate-alaska tags:
A quantized tensor when printed is transfered back to float using scale and zero_point. Each quantized tensor's scale and zero_point can be accessed by using `tensor.q_scale()` and `tensor.q_zero_point()`.
%% Cell type:code id:northern-outline tags:
``` python
print(f"Tensor scale: {qvalues.q_scale()}, zero_point= {qvalues.q_zero_point()}")
```
%% Cell type:markdown id:failing-theory tags:
Right now, we have chosen an arbitrary scale of 0.1 where we can clearly see that we were losing some precision.
In general, the quantization is done as following $Q(\text{x},\text{scale},\text{zero_point}) = round(\frac{x}{\text{scale}} + \text{zero_point})$
For this exercise, we do the scale calulation like following:
* The quantized range should be symmetric
* The quantized range should allow for a safety margin of 1 bit.
%% Cell type:code id:developmental-atlantic tags:
``` python
def tensor_scale(input):
return float(2*torch.max(torch.abs(torch.max(input)), torch.abs(torch.min(input))))/127.0
```
%% Cell type:code id:saving-rough tags:
``` python
scale = tensor_scale(rand_values)
print(scale)
qvalues = torch.quantize_per_tensor(rand_values, scale=scale, zero_point=64, dtype=torch.quint8)
print(qvalues)
```
%% Cell type:markdown id:daily-verse tags:
Currently, not all operations are available as "quantized" one.
For example, if you try an addition of two tensors `res = qvalues + qvalues` you will get an error message, that appears if you try to put quantized tensors into non-quantized operators
```
Could not run 'aten::add.Tensor' with arguments from the 'QuantizedCPU' backend. 'aten::add.Tensor' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, Meta, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
```
Luckily, for this lab all required operators exists as quantized ones. Every quantized operator additionally requires an output scale, and an output zero_point.
%% Cell type:code id:necessary-starter tags:
``` python
res = torch.ops.quantized.add(qvalues, qvalues, scale=0.1, zero_point=64)
```
%% Cell type:markdown id:adjacent-driving tags:
As a small warm up task, we execute a quantized addition:
**Your Task**:
* Create two random float tensors `a` and `b` of size 100x100 ( use `torch.rand`)
* For each tensor, calculate the tensor's scale using the provided `tensor_scale` function
* Calculate the addition in float (result `c`)
* Calculate the tensor scale of the result
* Now, we quantize the tensors `a` and `b` using the calculated scale
* Calculate the quantized addition using the function from above and plug in the output scale (scale of tensor 'c')
* Print the quantized result and the float result. Can you splot the loss in precision?
%% Cell type:code id:statutory-observer tags:
``` python
#-to-be-done- by student
###
###
#-to-be-done- by student
```
This diff is collapsed.
%% Cell type:markdown id:chinese-specific tags:
# Embedded ML Lab - Excercise 2 - Quantization (additional experiments)
%% Cell type:markdown id:coastal-ministry tags:
Until now, we always used a symmetric min/max scale for quantization of the activations, hence a centered zero point.
We will now do two things to squeeze a little bit more accuracy out of the quantization
* Firstly, we will loosen our assumption of a symmetric range/zero_point
* Secondly, we will consider "cutting away" parts that are not important for the classification choice
%% Cell type:code id:russian-textbook tags:
``` python
from net import CifarNet
import torch
torch.backends.quantized.engine = 'qnnpack'
import torchvision
from torchvision.datasets import CIFAR10
from torchvision.transforms import transforms
tf = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
testloader = torch.utils.data.DataLoader(torchvision.datasets.CIFAR10('data/', train=False, download=True, transform=tf), shuffle=False, batch_size=32)
import time
import nbimporter
from exercise_21 import net_time
from exercise_21 import net_acc
from exercise_21 import fuse_conv_bn_weights
from exercise_21 import QCifarNet, QConv2dReLU, QLinear
from exercise_21 import tensor_scale
```
%% Cell type:markdown id:southwest-congo tags:
We introduce two changed Modules (`QConv2dreluNSym`, `QLinerNSym`) that besides a scale (like in the last exercise) also have an adjustable zero_point that can be also set through the state_dict.
We use these two Modules in a new Classifier called `QCifarNetSym`
%% Cell type:code id:level-program tags:
``` python
from torch.nn.quantized.modules.utils import _pair_from_first
import torch.nn as nn
import torch.nn.functional as F
#Both classes now also have a state-dict entry for the zero_point
class QConv2dReLUNSym(QConv2dReLU):
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
super(QConv2dReLUNSym, self).__init__(in_channels, out_channels, kernel_size, stride, padding)
self.register_buffer('zero_point', torch.tensor(64))
def forward(self, x):
return torch.ops.quantized.conv2d_relu(x, self._prepack, self.scale, self.zero_point)
class QLinearNSym(QLinear):
def __init__(self, in_features, out_features):
super(QLinearNSym, self).__init__(in_features, out_features)
self.register_buffer('zero_point', torch.tensor(64))
def forward(self, x):
return torch.ops.quantized.linear(x, self._prepack, self.scale, self.zero_point)
class QCifarNetNSym(QCifarNet):
def __init__(self):
super(QCifarNet, self).__init__()
self.register_buffer("scale", torch.tensor(0.1))
self.conv1 = QConv2dReLUNSym(3, 16, 3, 1, padding=1)
self.conv2 = QConv2dReLUNSym(16,16, 3, 1, padding=1)
self.conv3 = QConv2dReLUNSym(16, 32, 3, 1, padding=1)
self.conv4 = QConv2dReLUNSym(32, 32, 3, 1, padding=1)
self.conv5 = QConv2dReLUNSym(32, 64, 3, 1, padding=1)
self.conv6 = QConv2dReLUNSym(64, 64, 3, 1, padding=1)
self.fc = QLinearNSym(1024, 10)
```
%% Cell type:markdown id:stuck-marker tags:
Your Task:
* Copy your class description of `CifarNetCalibration` from the last lab into the next block.
* Besides the calibration, add execute the function `plot_density` after each operator.
* Run the calibration batch again (code provided) and inspect the figures. What observation can you make?
%% Cell type:code id:latest-choice tags:
``` python
import matplotlib.pyplot as plt
def plot_density(x):
# input tensor x
x = x.detach()
plt.hist(x.flatten().numpy(), range=(float(x.min()),float(x.max())), density=True, bins=50)
plt.title('input probability density function')
plt.ylabel('likelihood')
plt.xlabel('values')
plt.show()
#We run the calibration using a batch from the testdata
net_calib = CifarNetCalibration()
net_calib.load_state_dict(torch.load('state_dict.pt'))
_, (data, _) = next(enumerate(testloader))
net_calib(data)
calibration_dict = net_calib.calibration_dict
```
%% Cell type:markdown id:ethical-berlin tags:
Your Task:
* Copy Your code that sets the quantized state dict after calibration from the last lab. The state dict has now new entries `zero_point` for the fused conv and the fc layer.
* Explore what happens if we change the scale of the conv layer in a way that better suits the plots. Use the provided code to determine the accuracy for each step.
* After that, adjust the scale of the conv layers according to the figures
Your Task:
* After that, play around with scale and zero_point of the fully connected layer. What conclusion can we draw?
%% Cell type:code id:dress-accused tags:
``` python
#prints keys from quantized net
qnet = QCifarNetNSym()
qsd = qnet.state_dict()
for key in qsd: print(key, qsd[key].dtype)
sd = torch.load('state_dict.pt')
###--- COPY YOUR IMPLEMENTATION HERE ---
### ------------------------------------
```
%% Cell type:code id:challenging-exhaust tags:
``` python
#We run the accuracy test again to see how much accuracy we loose through quantization
print(f"Accuracy quantized: {net_acc(QCifarNetNSym, qsd, testloader):.4%}")
```
%% Cell type:code id:portuguese-placement tags:
``` python
```
import torch
import torch.nn as nn
import torch.nn.functional as F
class CifarNet(torch.nn.Module):
def __init__(self):
super(CifarNet, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3, 1, padding=1)
self.conv2 = nn.Conv2d(16,16, 3, 1, padding=1)
self.conv3 = nn.Conv2d(16, 32, 3, 1, padding=1)
self.conv4 = nn.Conv2d(32, 32, 3, 1, padding=1)
self.conv5 = nn.Conv2d(32, 64, 3, 1, padding=1)
self.conv6 = nn.Conv2d(64, 64, 3, 1, padding=1)
self.bn1 = nn.BatchNorm2d(16)
self.bn2 = nn.BatchNorm2d(16)
self.bn3 = nn.BatchNorm2d(32)
self.bn4 = nn.BatchNorm2d(32)
self.bn5 = nn.BatchNorm2d(64)
self.bn6 = nn.BatchNorm2d(64)
self.fc = nn.Linear(1024, 10)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = F.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2, stride=2)
x = self.conv3(x)
x = self.bn3(x)
x = F.relu(x)
x = self.conv4(x)
x = self.bn4(x)
x = F.relu(x)
x = F.max_pool2d(x, 2, stride=2)
x = self.conv5(x)
x = self.bn5(x)
x = F.relu(x)
x = self.conv6(x)
x = self.bn6(x)
x = F.relu(x)
x = F.max_pool2d(x, 2, stride=2)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
\ No newline at end of file
exercises/2-quantization/src/cifarnet.png

91.5 KiB

exercises/2-quantization/src/cifarnet_quantized.png

113 KiB

File added
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment