added quantization lab

d06f4e9f · Kilian Pfeiffer · 1230ece6 · d06f4e9f · d06f4e9f · d06f4e9f
Commit d06f4e9f authored 10 months ago by Kilian Pfeiffer
--- a/exercises/2-quantization/data/cifar-10-python.tar.gz
+++ b/exercises/2-quantization/data/cifar-10-python.tar.gz
--- a/exercises/2-quantization/exercise_20.ipynb
+++ b/exercises/2-quantization/exercise_20.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "diverse-deadline",
+   "metadata": {},
+   "source": [
+    "# Embedded ML Lab - Excercise 2- Quantization\n",
+    "\n",
+    "In this exercise, we will explore quantization with PyTorch. Quantization in PyTorch is fairly new introduced, consequently not all APIs are already stable, and not all operations are already supported. Also, quantization right now only works on the cpu, not on the gpu.\n",
+    "\n",
+    "We will do the following things in the quantization exercise.\n",
+    "* First, we will play around with quantized tensors, and very common patterns in PyTorch we require later  \n",
+    "* In this lab, the unquantized network is already given, but we require a quantized version of the network\n",
+    "* To evaluate the benefits of quantization we require two things:\n",
+    "    * First, we are interested in the execution time of the quantized network and the speed improvements\n",
+    "    * Second, we are interested in how much accuracy we lose through quantization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "incoming-accident",
+   "metadata": {},
+   "source": [
+    "For this lab, the network is already implemented in `net.py`, take a look at it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "acceptable-background",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from net import CifarNet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "returning-parade",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net = CifarNet()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "hundred-sitting",
+   "metadata": {},
+   "source": [
+    "This time we will work with a slightly more complex dataset, called CIFAR (CIFAR10) https://www.cs.toronto.edu/~kriz/cifar.html where each sample contains 3 input channels (RGB), and 32 by 32 pixels. There are 10 different types of classes, namely:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "continental-bermuda",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torchvision\n",
+    "import torchvision.transforms as transforms\n",
+    "tf = transforms.Compose([transforms.ToTensor()])\n",
+    "trainloader = torch.utils.data.DataLoader(torchvision.datasets.CIFAR10('data/', train=True, download=True, transform=tf), batch_size=64, shuffle=False)\n",
+    "print(f\"Classes in CIFAR10 {trainloader.dataset.classes}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "treated-discharge",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "_, (cifar_samples, targets) = next(enumerate(trainloader))\n",
+    "import matplotlib.pyplot as plt\n",
+    "fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(12, 6),\n",
+    "                        subplot_kw={'xticks': [], 'yticks': []}, constrained_layout=True)\n",
+    "\n",
+    "for idx, ax in enumerate(axs.flat):\n",
+    "    ax.imshow(cifar_samples[idx,:,:,:].permute(1,2,0).numpy()*0.5 + 0.5, interpolation='spline16')\n",
+    "    ax.set_title(trainloader.dataset.classes[targets[idx]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "editorial-trade",
+   "metadata": {},
+   "source": [
+    "**Now we focus on quantized operators:**  \n",
+    "A normal float tensor can be quantized using `torch.quantize_per_tensor(input, scale, zero_point, dtype)` where we have to define a *scale*, a *zero_point*, and a datatpye.  \n",
+    "Currently, PyTorch requires all `weights` to be of datatype `torch.qint8`, therefore we use a zero point of `0`, to have the same resultion in positive and negative range.   \n",
+    "Activations are of type `torch.quint8`. It is recommended to set the scale for these to `64` to have one bit as safety margin for overflows.\n",
+    "A tensor can very easily transfered back to a float tensor using `torch.dequantize()`\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "nervous-stomach",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rand_values = torch.rand(3,3)\n",
+    "print(f\"float tensor\\n{rand_values}\")\n",
+    "\n",
+    "qvalues = torch.quantize_per_tensor(rand_values, scale=0.1, zero_point=64, dtype=torch.quint8)\n",
+    "\n",
+    "print(f\"quantized tensor (uint8)\\n{qvalues}\")\n",
+    "\n",
+    "res = torch.dequantize(qvalues)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "interstate-alaska",
+   "metadata": {},
+   "source": [
+    "A quantized tensor when printed is transfered back to float using scale and zero_point. Each quantized tensor's scale and zero_point can be accessed by using `tensor.q_scale()` and `tensor.q_zero_point()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "northern-outline",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f\"Tensor scale: {qvalues.q_scale()}, zero_point= {qvalues.q_zero_point()}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "failing-theory",
+   "metadata": {},
+   "source": [
+    "Right now, we have chosen an arbitrary scale of 0.1 where we can clearly see that we were losing some precision.\n",
+    "In general, the quantization is done as following $Q(\\text{x},\\text{scale},\\text{zero_point}) = round(\\frac{x}{\\text{scale}} + \\text{zero_point})$  \n",
+    "For this exercise, we do the scale calulation like following:  \n",
+    "* The quantized range should be symmetric  \n",
+    "* The quantized range should allow for a safety margin of 1 bit.  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "developmental-atlantic",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def tensor_scale(input):\n",
+    "    return float(2*torch.max(torch.abs(torch.max(input)), torch.abs(torch.min(input))))/127.0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "saving-rough",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scale = tensor_scale(rand_values)\n",
+    "print(scale)\n",
+    "qvalues = torch.quantize_per_tensor(rand_values, scale=scale, zero_point=64, dtype=torch.quint8)\n",
+    "print(qvalues)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "daily-verse",
+   "metadata": {},
+   "source": [
+    "Currently, not all operations are available as \"quantized\" one.  \n",
+    "For example, if you try an addition of two tensors `res = qvalues + qvalues` you will get an error message, that appears if you try to put quantized tensors into non-quantized operators\n",
+    "\n",
+    "```\n",
+    "Could not run 'aten::add.Tensor' with arguments from the 'QuantizedCPU' backend. 'aten::add.Tensor' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, Meta, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].\n",
+    "```\n",
+    "Luckily, for this lab all required operators exists as quantized ones. Every quantized operator additionally requires an output scale, and an output zero_point.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "necessary-starter",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "res = torch.ops.quantized.add(qvalues, qvalues, scale=0.1, zero_point=64)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adjacent-driving",
+   "metadata": {},
+   "source": [
+    "As a small warm up task, we execute a quantized addition:  \n",
+    "**Your Task**:  \n",
+    "* Create two random float tensors `a` and `b` of size 100x100 ( use `torch.rand`)   \n",
+    "* For each tensor, calculate the tensor's scale using the provided `tensor_scale` function\n",
+    "* Calculate the addition in float (result `c`)\n",
+    "* Calculate the tensor scale of the result\n",
+    "* Now, we quantize the tensors `a` and `b` using the calculated scale\n",
+    "* Calculate the quantized addition using the function from above and plug in the output scale (scale of tensor 'c')\n",
+    "* Print the quantized result and the float result. Can you splot the loss in precision?\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "statutory-observer",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#-to-be-done- by student \n",
+    "###\n",
+    "###\n",
+    "#-to-be-done- by student "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:diverse-deadline tags:
+
+# Embedded ML Lab - Excercise 2- Quantization
+
+In this exercise, we will explore quantization with PyTorch. Quantization in PyTorch is fairly new introduced, consequently not all APIs are already stable, and not all operations are already supported. Also, quantization right now only works on the cpu, not on the gpu.
+
+We will do the following things in the quantization exercise.
+* First, we will play around with quantized tensors, and very common patterns in PyTorch we require later
+* In this lab, the unquantized network is already given, but we require a quantized version of the network
+* To evaluate the benefits of quantization we require two things:
+    * First, we are interested in the execution time of the quantized network and the speed improvements
+    * Second, we are interested in how much accuracy we lose through quantization
+
+%% Cell type:markdown id:incoming-accident tags:
+
+For this lab, the network is already implemented in `net.py`, take a look at it.
+
+%% Cell type:code id:acceptable-background tags:
+
+``` python
+from net import CifarNet
+```
+
+%% Cell type:code id:returning-parade tags:
+
+``` python
+net = CifarNet()
+```
+
+%% Cell type:markdown id:hundred-sitting tags:
+
+This time we will work with a slightly more complex dataset, called CIFAR (CIFAR10) https://www.cs.toronto.edu/~kriz/cifar.html where each sample contains 3 input channels (RGB), and 32 by 32 pixels. There are 10 different types of classes, namely:
+
+%% Cell type:code id:continental-bermuda tags:
+
+``` python
+import torch
+import torchvision
+import torchvision.transforms as transforms
+tf = transforms.Compose([transforms.ToTensor()])
+trainloader = torch.utils.data.DataLoader(torchvision.datasets.CIFAR10('data/', train=True, download=True, transform=tf), batch_size=64, shuffle=False)
+print(f"Classes in CIFAR10 {trainloader.dataset.classes}")
+```
+
+%% Cell type:code id:treated-discharge tags:
+
+``` python
+_, (cifar_samples, targets) = next(enumerate(trainloader))
+import matplotlib.pyplot as plt
+fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(12, 6),
+                        subplot_kw={'xticks': [], 'yticks': []}, constrained_layout=True)
+
+for idx, ax in enumerate(axs.flat):
+    ax.imshow(cifar_samples[idx,:,:,:].permute(1,2,0).numpy()*0.5 + 0.5, interpolation='spline16')
+    ax.set_title(trainloader.dataset.classes[targets[idx]])
+plt.show()
+```
+
+%% Cell type:markdown id:editorial-trade tags:
+
+**Now we focus on quantized operators:**
+A normal float tensor can be quantized using `torch.quantize_per_tensor(input, scale, zero_point, dtype)` where we have to define a *scale*, a *zero_point*, and a datatpye.
+Currently, PyTorch requires all `weights` to be of datatype `torch.qint8`, therefore we use a zero point of `0`, to have the same resultion in positive and negative range.
+Activations are of type `torch.quint8`. It is recommended to set the scale for these to `64` to have one bit as safety margin for overflows.
+A tensor can very easily transfered back to a float tensor using `torch.dequantize()`
+
+%% Cell type:code id:nervous-stomach tags:
+
+``` python
+rand_values = torch.rand(3,3)
+print(f"float tensor\n{rand_values}")
+
+qvalues = torch.quantize_per_tensor(rand_values, scale=0.1, zero_point=64, dtype=torch.quint8)
+
+print(f"quantized tensor (uint8)\n{qvalues}")
+
+res = torch.dequantize(qvalues)
+```
+
+%% Cell type:markdown id:interstate-alaska tags:
+
+A quantized tensor when printed is transfered back to float using scale and zero_point. Each quantized tensor's scale and zero_point can be accessed by using `tensor.q_scale()` and `tensor.q_zero_point()`.
+
+%% Cell type:code id:northern-outline tags:
+
+``` python
+print(f"Tensor scale: {qvalues.q_scale()}, zero_point= {qvalues.q_zero_point()}")
+```
+
+%% Cell type:markdown id:failing-theory tags:
+
+Right now, we have chosen an arbitrary scale of 0.1 where we can clearly see that we were losing some precision.
+In general, the quantization is done as following $Q(\text{x},\text{scale},\text{zero_point}) = round(\frac{x}{\text{scale}} + \text{zero_point})$
+For this exercise, we do the scale calulation like following:
+* The quantized range should be symmetric
+* The quantized range should allow for a safety margin of 1 bit.
+
+%% Cell type:code id:developmental-atlantic tags:
+
+``` python
+def tensor_scale(input):
+    return float(2*torch.max(torch.abs(torch.max(input)), torch.abs(torch.min(input))))/127.0
+```
+
+%% Cell type:code id:saving-rough tags:
+
+``` python
+scale = tensor_scale(rand_values)
+print(scale)
+qvalues = torch.quantize_per_tensor(rand_values, scale=scale, zero_point=64, dtype=torch.quint8)
+print(qvalues)
+```
+
+%% Cell type:markdown id:daily-verse tags:
+
+Currently, not all operations are available as "quantized" one.
+For example, if you try an addition of two tensors `res = qvalues + qvalues` you will get an error message, that appears if you try to put quantized tensors into non-quantized operators
+
+```
+Could not run 'aten::add.Tensor' with arguments from the 'QuantizedCPU' backend. 'aten::add.Tensor' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, Meta, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
+```
+Luckily, for this lab all required operators exists as quantized ones. Every quantized operator additionally requires an output scale, and an output zero_point.
+
+%% Cell type:code id:necessary-starter tags:
+
+``` python
+res = torch.ops.quantized.add(qvalues, qvalues, scale=0.1, zero_point=64)
+```
+
+%% Cell type:markdown id:adjacent-driving tags:
+
+As a small warm up task, we execute a quantized addition:
+**Your Task**:
+* Create two random float tensors `a` and `b` of size 100x100 ( use `torch.rand`)
+* For each tensor, calculate the tensor's scale using the provided `tensor_scale` function
+* Calculate the addition in float (result `c`)
+* Calculate the tensor scale of the result
+* Now, we quantize the tensors `a` and `b` using the calculated scale
+* Calculate the quantized addition using the function from above and plug in the output scale (scale of tensor 'c')
+* Print the quantized result and the float result. Can you splot the loss in precision?
+
+%% Cell type:code id:statutory-observer tags:
+
+``` python
+#-to-be-done- by student
+###
+###
+#-to-be-done- by student
+```
--- a/exercises/2-quantization/exercise_21.ipynb
+++ b/exercises/2-quantization/exercise_21.ipynb
--- a/exercises/2-quantization/exercise_22.ipynb
+++ b/exercises/2-quantization/exercise_22.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "chinese-specific",
+   "metadata": {},
+   "source": [
+    "# Embedded ML Lab - Excercise 2 - Quantization (additional experiments)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "coastal-ministry",
+   "metadata": {},
+   "source": [
+    "Until now, we always used a symmetric min/max scale for quantization of the activations, hence a centered zero point.\n",
+    "\n",
+    "We will now do two things to squeeze a little bit more accuracy out of the quantization  \n",
+    "* Firstly, we will loosen our assumption of a symmetric range/zero_point\n",
+    "* Secondly, we will consider \"cutting away\" parts that are not important for the classification choice\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "russian-textbook",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from net import CifarNet\n",
+    "import torch\n",
+    "\n",
+    "torch.backends.quantized.engine = 'qnnpack'\n",
+    "\n",
+    "import torchvision\n",
+    "from torchvision.datasets import CIFAR10\n",
+    "from torchvision.transforms import transforms\n",
+    "tf = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\n",
+    "testloader = torch.utils.data.DataLoader(torchvision.datasets.CIFAR10('data/', train=False, download=True, transform=tf), shuffle=False, batch_size=32)\n",
+    "\n",
+    "import time\n",
+    "import nbimporter \n",
+    "\n",
+    "from exercise_21 import net_time\n",
+    "from exercise_21 import net_acc\n",
+    "from exercise_21 import fuse_conv_bn_weights\n",
+    "from exercise_21 import QCifarNet, QConv2dReLU, QLinear\n",
+    "from exercise_21 import tensor_scale"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "southwest-congo",
+   "metadata": {},
+   "source": [
+    "We introduce two changed Modules (`QConv2dreluNSym`, `QLinerNSym`) that besides a scale (like in the last exercise) also have an adjustable zero_point that can be also set through the state_dict.\n",
+    "\n",
+    "We use these two Modules in a new Classifier called `QCifarNetSym`  "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "level-program",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from torch.nn.quantized.modules.utils import _pair_from_first\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "#Both classes now also have a state-dict entry for the zero_point\n",
+    "class QConv2dReLUNSym(QConv2dReLU):\n",
+    "    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):\n",
+    "        super(QConv2dReLUNSym, self).__init__(in_channels, out_channels, kernel_size, stride, padding)\n",
+    "        self.register_buffer('zero_point', torch.tensor(64))\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        return torch.ops.quantized.conv2d_relu(x, self._prepack, self.scale, self.zero_point)\n",
+    "\n",
+    "    \n",
+    "class QLinearNSym(QLinear):\n",
+    "    def __init__(self, in_features, out_features):\n",
+    "        super(QLinearNSym, self).__init__(in_features, out_features)\n",
+    "        self.register_buffer('zero_point', torch.tensor(64))\n",
+    "   \n",
+    "    def forward(self, x):\n",
+    "        return torch.ops.quantized.linear(x, self._prepack, self.scale, self.zero_point)\n",
+    "    \n",
+    "    \n",
+    "class QCifarNetNSym(QCifarNet):\n",
+    "    def __init__(self):\n",
+    "        super(QCifarNet, self).__init__()\n",
+    "        \n",
+    "        self.register_buffer(\"scale\", torch.tensor(0.1))\n",
+    "\n",
+    "        self.conv1 = QConv2dReLUNSym(3, 16, 3, 1, padding=1)\n",
+    "        self.conv2 = QConv2dReLUNSym(16,16, 3, 1, padding=1)\n",
+    "\n",
+    "        self.conv3 = QConv2dReLUNSym(16, 32, 3, 1, padding=1)\n",
+    "        self.conv4 = QConv2dReLUNSym(32, 32, 3, 1, padding=1)\n",
+    "\n",
+    "        self.conv5 = QConv2dReLUNSym(32, 64, 3, 1, padding=1)\n",
+    "        self.conv6 = QConv2dReLUNSym(64, 64, 3, 1, padding=1)\n",
+    "\n",
+    "        self.fc = QLinearNSym(1024, 10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "stuck-marker",
+   "metadata": {},
+   "source": [
+    "Your Task:\n",
+    "   * Copy your class description of `CifarNetCalibration` from the last lab into the next block.\n",
+    "   * Besides the calibration, add execute the function `plot_density` after each operator.\n",
+    "   * Run the calibration batch again (code provided) and inspect the figures. What observation can you make?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "latest-choice",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "def plot_density(x):\n",
+    "    # input tensor x\n",
+    "    x = x.detach()\n",
+    "    plt.hist(x.flatten().numpy(), range=(float(x.min()),float(x.max())), density=True, bins=50)\n",
+    "    plt.title('input probability density function')\n",
+    "    plt.ylabel('likelihood')\n",
+    "    plt.xlabel('values')\n",
+    "    plt.show()\n",
+    "    \n",
+    "\n",
+    "    \n",
+    "#We run the calibration using a batch from the testdata\n",
+    "net_calib = CifarNetCalibration()\n",
+    "net_calib.load_state_dict(torch.load('state_dict.pt'))\n",
+    "_, (data, _) = next(enumerate(testloader))\n",
+    "net_calib(data)\n",
+    "calibration_dict = net_calib.calibration_dict"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ethical-berlin",
+   "metadata": {},
+   "source": [
+    "Your Task: \n",
+    "* Copy Your code that sets the quantized state dict after calibration from the last lab. The state dict has now new entries `zero_point` for the fused conv and the fc layer.\n",
+    "* Explore what happens if we change the scale of the conv layer in a way that better suits the plots. Use the provided code to determine the accuracy for each step.\n",
+    "* After that, adjust the scale of the conv layers according to the figures\n",
+    "\n",
+    "Your Task:\n",
+    "* After that, play around with scale and zero_point of the fully connected layer. What conclusion can we draw?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dress-accused",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#prints keys from quantized net\n",
+    "qnet = QCifarNetNSym()\n",
+    "qsd = qnet.state_dict()\n",
+    "for key in qsd: print(key, qsd[key].dtype)\n",
+    "\n",
+    "sd = torch.load('state_dict.pt')\n",
+    "\n",
+    "###--- COPY YOUR IMPLEMENTATION HERE ---\n",
+    "\n",
+    "\n",
+    "### ------------------------------------"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "challenging-exhaust",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#We run the accuracy test again to see how much accuracy we loose through quantization\n",
+    "print(f\"Accuracy quantized: {net_acc(QCifarNetNSym, qsd, testloader):.4%}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "portuguese-placement",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:chinese-specific tags:
+
+# Embedded ML Lab - Excercise 2 - Quantization (additional experiments)
+
+%% Cell type:markdown id:coastal-ministry tags:
+
+Until now, we always used a symmetric min/max scale for quantization of the activations, hence a centered zero point.
+
+We will now do two things to squeeze a little bit more accuracy out of the quantization
+* Firstly, we will loosen our assumption of a symmetric range/zero_point
+* Secondly, we will consider "cutting away" parts that are not important for the classification choice
+
+%% Cell type:code id:russian-textbook tags:
+
+``` python
+from net import CifarNet
+import torch
+
+torch.backends.quantized.engine = 'qnnpack'
+
+import torchvision
+from torchvision.datasets import CIFAR10
+from torchvision.transforms import transforms
+tf = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
+testloader = torch.utils.data.DataLoader(torchvision.datasets.CIFAR10('data/', train=False, download=True, transform=tf), shuffle=False, batch_size=32)
+
+import time
+import nbimporter
+
+from exercise_21 import net_time
+from exercise_21 import net_acc
+from exercise_21 import fuse_conv_bn_weights
+from exercise_21 import QCifarNet, QConv2dReLU, QLinear
+from exercise_21 import tensor_scale
+```
+
+%% Cell type:markdown id:southwest-congo tags:
+
+We introduce two changed Modules (`QConv2dreluNSym`, `QLinerNSym`) that besides a scale (like in the last exercise) also have an adjustable zero_point that can be also set through the state_dict.
+
+We use these two Modules in a new Classifier called `QCifarNetSym`
+
+%% Cell type:code id:level-program tags:
+
+``` python
+from torch.nn.quantized.modules.utils import _pair_from_first
+import torch.nn as nn
+import torch.nn.functional as F
+
+#Both classes now also have a state-dict entry for the zero_point
+class QConv2dReLUNSym(QConv2dReLU):
+    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1):
+        super(QConv2dReLUNSym, self).__init__(in_channels, out_channels, kernel_size, stride, padding)
+        self.register_buffer('zero_point', torch.tensor(64))
+
+    def forward(self, x):
+        return torch.ops.quantized.conv2d_relu(x, self._prepack, self.scale, self.zero_point)
+
+
+class QLinearNSym(QLinear):
+    def __init__(self, in_features, out_features):
+        super(QLinearNSym, self).__init__(in_features, out_features)
+        self.register_buffer('zero_point', torch.tensor(64))
+
+    def forward(self, x):
+        return torch.ops.quantized.linear(x, self._prepack, self.scale, self.zero_point)
+
+
+class QCifarNetNSym(QCifarNet):
+    def __init__(self):
+        super(QCifarNet, self).__init__()
+
+        self.register_buffer("scale", torch.tensor(0.1))
+
+        self.conv1 = QConv2dReLUNSym(3, 16, 3, 1, padding=1)
+        self.conv2 = QConv2dReLUNSym(16,16, 3, 1, padding=1)
+
+        self.conv3 = QConv2dReLUNSym(16, 32, 3, 1, padding=1)
+        self.conv4 = QConv2dReLUNSym(32, 32, 3, 1, padding=1)
+
+        self.conv5 = QConv2dReLUNSym(32, 64, 3, 1, padding=1)
+        self.conv6 = QConv2dReLUNSym(64, 64, 3, 1, padding=1)
+
+        self.fc = QLinearNSym(1024, 10)
+```
+
+%% Cell type:markdown id:stuck-marker tags:
+
+Your Task:
+   * Copy your class description of `CifarNetCalibration` from the last lab into the next block.
+   * Besides the calibration, add execute the function `plot_density` after each operator.
+   * Run the calibration batch again (code provided) and inspect the figures. What observation can you make?
+
+%% Cell type:code id:latest-choice tags:
+
+``` python
+import matplotlib.pyplot as plt
+
+def plot_density(x):
+    # input tensor x
+    x = x.detach()
+    plt.hist(x.flatten().numpy(), range=(float(x.min()),float(x.max())), density=True, bins=50)
+    plt.title('input probability density function')
+    plt.ylabel('likelihood')
+    plt.xlabel('values')
+    plt.show()
+
+
+
+#We run the calibration using a batch from the testdata
+net_calib = CifarNetCalibration()
+net_calib.load_state_dict(torch.load('state_dict.pt'))
+_, (data, _) = next(enumerate(testloader))
+net_calib(data)
+calibration_dict = net_calib.calibration_dict
+```
+
+%% Cell type:markdown id:ethical-berlin tags:
+
+Your Task:
+* Copy Your code that sets the quantized state dict after calibration from the last lab. The state dict has now new entries `zero_point` for the fused conv and the fc layer.
+* Explore what happens if we change the scale of the conv layer in a way that better suits the plots. Use the provided code to determine the accuracy for each step.
+* After that, adjust the scale of the conv layers according to the figures
+
+Your Task:
+* After that, play around with scale and zero_point of the fully connected layer. What conclusion can we draw?
+
+%% Cell type:code id:dress-accused tags:
+
+``` python
+#prints keys from quantized net
+qnet = QCifarNetNSym()
+qsd = qnet.state_dict()
+for key in qsd: print(key, qsd[key].dtype)
+
+sd = torch.load('state_dict.pt')
+
+###--- COPY YOUR IMPLEMENTATION HERE ---
+
+
+### ------------------------------------
+```
+
+%% Cell type:code id:challenging-exhaust tags:
+
+``` python
+#We run the accuracy test again to see how much accuracy we loose through quantization
+print(f"Accuracy quantized: {net_acc(QCifarNetNSym, qsd, testloader):.4%}")
+```
+
+%% Cell type:code id:portuguese-placement tags:
+
+``` python
+```
--- a/exercises/2-quantization/net.py
+++ b/exercises/2-quantization/net.py
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+class CifarNet(torch.nn.Module):
+    def __init__(self):
+        super(CifarNet, self).__init__()
+        self.conv1 = nn.Conv2d(3, 16, 3, 1, padding=1)
+        self.conv2 = nn.Conv2d(16,16, 3, 1, padding=1)
+
+        self.conv3 = nn.Conv2d(16, 32, 3, 1, padding=1)
+        self.conv4 = nn.Conv2d(32, 32, 3, 1, padding=1)
+
+        self.conv5 = nn.Conv2d(32, 64, 3, 1, padding=1)
+        self.conv6 = nn.Conv2d(64, 64, 3, 1, padding=1)
+
+        self.bn1 = nn.BatchNorm2d(16)
+        self.bn2 = nn.BatchNorm2d(16)
+        self.bn3 = nn.BatchNorm2d(32)
+        self.bn4 = nn.BatchNorm2d(32)
+        self.bn5 = nn.BatchNorm2d(64)
+        self.bn6 = nn.BatchNorm2d(64)
+
+        self.fc = nn.Linear(1024, 10)
+
+    def forward(self, x):
+        
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = F.relu(x)
+
+        x = self.conv2(x)
+        x = self.bn2(x)
+        x = F.relu(x)
+
+        x = F.max_pool2d(x, 2, stride=2)
+
+        x = self.conv3(x)
+        x = self.bn3(x)
+        x = F.relu(x)
+
+        x = self.conv4(x)
+        x = self.bn4(x)
+        x = F.relu(x)
+
+        x = F.max_pool2d(x, 2, stride=2)
+
+        x = self.conv5(x)
+        x = self.bn5(x)
+        x = F.relu(x)
+
+        x = self.conv6(x)
+        x = self.bn6(x)
+        x = F.relu(x)
+
+        x = F.max_pool2d(x, 2, stride=2)
+
+        x = torch.flatten(x, 1)
+        x = self.fc(x)
+
+        return x
\ No newline at end of file
--- a/exercises/2-quantization/src/cifarnet.png
+++ b/exercises/2-quantization/src/cifarnet.png
--- a/exercises/2-quantization/src/cifarnet_quantized.png
+++ b/exercises/2-quantization/src/cifarnet_quantized.png
--- a/exercises/2-quantization/state_dict.pt
+++ b/exercises/2-quantization/state_dict.pt