first day

8977c9d7 · ueldn · 00e2fe47 · 8977c9d7 · 8977c9d7 · 8977c9d7
Commit 8977c9d7 authored 11 months ago by ueldn
--- a/exercises/0-intro/.ipynb_checkpoints/exercise_00-checkpoint.ipynb
+++ b/exercises/0-intro/.ipynb_checkpoints/exercise_00-checkpoint.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5414047f-c2e5-4b51-a4bf-7b6b9c562a19",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# Embedded ML Lab - Excercise 0 - Intro Pytorch\n",
+    "\n",
+    "* Documentation Pytorch: https://pytorch.org/docs/stable/index.html\n",
+    "* Documentation Matplotlib: https://matplotlib.org/stable/contents.html\n",
+    "\n",
+    "### Tensor basics\n",
+    "`PyTorch` uses _pytorch_ _tensors_ to store N-dimensional data similar to NumPy or Matlab. Torch tensors support a variety of matrix or vector operations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "32ca05e2-f9c9-4703-b923-28f02bb337f7",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([15,  9])\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "import torch\n",
+    "torch.rand(1).to('cuda') #initialize cuda context (might take a while)\n",
+    "x = torch.tensor([5,3]) #create variable\n",
+    "y = torch.tensor([3,3])\n",
+    "\n",
+    "z = x * y #point-wise multiplication of two variables \n",
+    "print(z)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91e9f788-ce37-4a53-a1e1-aa8fd34db305",
+   "metadata": {},
+   "source": [
+    "Also, there are several methods to initialize tensors like `torch.ones / torch.zeros / torch.randn`   \n",
+    "We can get the shape of a tensor by calling `size` on a tensor"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "1b85c810-8ee9-4fbc-b520-47ba10a65a6f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([10, 10, 5])\n"
+     ]
+    }
+   ],
+   "source": [
+    "ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]\n",
+    "rand = torch.randn((4,4)) # creates an 2-dimensional random tensor with size [4,4]\n",
+    "\n",
+    "print(ones.size()) # returns a python list with dimension"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f828aaec-cb84-4b16-a527-3902bc9f8a15",
+   "metadata": {},
+   "source": [
+    "Pytorch tensors can also have different datatypes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "d291bc8a-3c07-4d84-b226-e890f923290a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.ones((10,10), dtype=torch.int) #inits a tensor with ones as int\n",
+    "torch.ones((10,10), dtype=torch.float) #inits a tensor with ones as float (standard)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7c838ef-d685-4021-b9bb-6fd50b6352b0",
+   "metadata": {},
+   "source": [
+    "Similar to NumPy or Matlab we can also slice tensors with indices (NumPy Indexing: https://numpy.org/doc/stable/reference/arrays.indexing.html)   \n",
+    "Slicing is equivalent to a torch.view. As the name suggests, this does not change the underlying storage or create a copy, meaning if we change the data, all associated views also show the changes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "abff0483-bd54-4218-9c6d-cd2cc4660ee5",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Size of a: torch.Size([5])\n",
+      "tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])\n",
+      "tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])\n",
+      "tensor([7.1100, 7.1100, 7.1100, 7.1100, 7.1100])\n"
+     ]
+    }
+   ],
+   "source": [
+    "ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]\n",
+    "a = ones[0:5,0,0] # we create a view by slicing out index 0,1,2,3,4 from the first dimension and use : to slice all indices for dimension 2 and 3\n",
+    "print(f\"Size of a: {a.size()}\")\n",
+    "\n",
+    "ones[0:5,:,:] = 3.14 \n",
+    "print(a)\n",
+    "b = ones.clone()[0:5,0,0] #cloning a tensor creates an independent copy\n",
+    "ones[0:5,:,:] = 7.11\n",
+    "print(b)\n",
+    "print(a)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41fdf1fa-61a6-47b4-b613-344f6249c42d",
+   "metadata": {},
+   "source": [
+    "Other usefull tensor operations are `flatten()`, `sum()`, `max()`, `min()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "13c264c0-3ce1-418b-b626-643af4bcd679",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Shape of a: torch.Size([10, 10, 10]), Shape of a_flattened: torch.Size([1000])\n",
+      "Sum: tensor([100., 100., 100., 100., 100., 100., 100., 100., 100., 100.])\n",
+      "Sum: 1000.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "a = torch.ones((10,10,10))\n",
+    "a_flattened = a.flatten()\n",
+    "print(f\"Shape of a: {a.size()}, Shape of a_flattened: {a_flattened.size()}\")\n",
+    "sum_of_a = a.sum(dim=(0,1)) # sum of dimens 0 and 1 \n",
+    "print(f\"Sum: {sum_of_a}\")\n",
+    "sum_of_a = a.sum(dim=(0,1,2)) #sum_of_all_entries\n",
+    "print(f\"Sum: {sum_of_a}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b3d3890-8d52-4c95-a500-2780fae86f40",
+   "metadata": {},
+   "source": [
+    "A very special property of pytorch tensors is that they can be pushed to a device (a GPU) and operations can be done on a GPU. This can speedup operations dramatically, if the required operations are parallelizable.    \n",
+    "We therefore first check if pytorch can reach the Jetsons' GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "11d2bb31-e22d-4cf1-99f2-98bf17f0616e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CUDA available:        yes\n",
+      "tensor(11.3122)\n",
+      "6.9 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n",
+      "tensor(10.7327, device='cuda:0')\n",
+      "11.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n",
+      "tensor(10.8747)\n",
+      "20.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n",
+      "tensor(8.0055, device='cuda:0')\n",
+      "2.69 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "print(f'CUDA available:        {[\"no\", \"yes\"][torch.cuda.is_available()]}')\n",
+    "\n",
+    "a = torch.zeros((10**4, 10**4))\n",
+    "b = torch.zeros((10**4, 10**4))\n",
+    "\n",
+    "def f(device, n, k):\n",
+    "    x = torch.randn(n, n, dtype=torch.float32, device=device)\n",
+    "    for _ in range(k):\n",
+    "        x = torch.matmul(x, x)\n",
+    "        x = (x - x.mean()) / x.std()\n",
+    "    return x.max()\n",
+    "\n",
+    "n = 256\n",
+    "k = 100\n",
+    "\n",
+    "%timeit -n 1 -r 1 print(f('cpu',  n, k))\n",
+    "%timeit -n 1 -r 1 print(f('cuda', n, k))\n",
+    "%timeit -n 1 -r 1 print(f('cpu',  4*n, k))\n",
+    "%timeit -n 1 -r 1 print(f('cuda', 4*n, k))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4092e33d-77d9-4203-8546-3af85181e4e9",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "PyTorch tensors (data/nn-weights) can also be stored and loaded from disk.   \n",
+    "We load a sample from the MNIST dataset, which is stored as \"mnist_sample.pt\" on the disk.\n",
+    "The MNIST Dataset consists of images of handwritten grayscale images with digits from `0-9`\n",
+    "* This can be done by using `torch.load(\"filename\")`. Similarly, we can store tensors`toch.store(tensor, \"filename\")`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "4df9bb5b-40c2-49d5-aaf6-4a3aa046d4ca",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([28, 28])\n"
+     ]
+    }
+   ],
+   "source": [
+    "mnist_sample = torch.load(\"mnist_sample.pt\") #this loads a 28 by 28 pixel image from the MNSIT dataset\n",
+    "print(mnist_sample.size())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "43b7dd30-19b3-492f-bdc0-20d93cfa17b6",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<matplotlib.image.AxesImage at 0x7ea8f0ef60>"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAOjUlEQVR4nO3df4wUdZrH8c9zyBIVVFBvMgjKivgHnDkw+CNqLlxWNh4mDPgri+T0IjirWQiaiyfumSwJucSceud/JIMi3Lm6WSMGs1FZFsnJabJxNICDuuucQnYmwATnj5WYyAHP/THFZdSpb49d1V3NPO9XMpnueqa6Hls/VnV9u+pr7i4AY99fVN0AgOYg7EAQhB0IgrADQRB2IIizmrkxM+PUP9Bg7m4jLS+0ZzezW8zsD2bWa2Zri7wWgMayesfZzWycpD9KWiipT9J7kpa5+0eJddizAw3WiD37tZJ63f0zdz8u6VeSOgq8HoAGKhL2SyT9adjzvmzZN5hZp5l1m1l3gW0BKKjhJ+jcvUtSl8RhPFClInv2fknThz2fli0D0IKKhP09SbPM7Idm9gNJP5H0WjltAShb3Yfx7n7CzFZJ2i5pnKRN7r6/tM4AlKruobe6NsZndqDhGvKlGgBnDsIOBEHYgSAIOxAEYQeCIOxAEE29nh3xzJ49O7e2bt265Lp33nlnsn7y5MlkfePGjbm1Bx98MLnuWMSeHQiCsANBEHYgCMIOBEHYgSAIOxAEQ28oZM6cOcn6m2++mVubOnVqct3jx48n6ytWrEjWX3jhhWQ9GvbsQBCEHQiCsANBEHYgCMIOBEHYgSAIOxAEd5dF0mWXXZas79q1q9D6KStXrkzWn3/++bpfeyzj7rJAcIQdCIKwA0EQdiAIwg4EQdiBIAg7EATXswc3Y8aMZP2RRx5J1sePH1/3tvfu3Zusb9u2re7XxncVCruZHZD0paSTkk64+/wymgJQvjL27H/r7kdLeB0ADcRndiCIomF3Sb81s/fNrHOkPzCzTjPrNrPugtsCUEDRw/ib3L3fzP5S0g4z+8Td3x7+B+7eJalL4kIYoEqF9uzu3p/9HpD0qqRry2gKQPnqDruZnWtmk04/lvRjST1lNQagXEUO49skvWpmp1/nRXfPv0k4KjFhwoRkffXq1cn6Aw88UGj7u3fvzq2tX78+ue7g4GChbeOb6g67u38m6a9L7AVAAzH0BgRB2IEgCDsQBGEHgiDsQBDcSnoMmDZtWm5tw4YNyXUXLVpUaNt9fX3J+uLFi3NrtS5xRX24lTQQHGEHgiDsQBCEHQiCsANBEHYgCMIOBMGtpMeAWbNm5daKjqPXsnz58mSdsfTWwZ4dCIKwA0EQdiAIwg4EQdiBIAg7EARhB4JgnP0McPXVVyfrW7Zsadi2d+7cmax/8sknDds2ysWeHQiCsANBEHYgCMIOBEHYgSAIOxAEYQeC4L7xLaCtrS1Z7+7uTtanTp1a97Y3btyYrD/++OPJ+tGjR+vedlEXX3xxsj5z5szc2uHDh5PrHjhwoJ6WWkLd9403s01mNmBmPcOWTTGzHWb2afZ7cpnNAijfaA7jN0u65VvL1kra6e6zJO3MngNoYTXD7u5vSxr81uIOSae/o7lF0pJy2wJQtnq/G9/m7oeyx4cl5X7oNLNOSZ11bgdASQpfCOPunjrx5u5dkrokTtABVap36O2ImbVLUvZ7oLyWADRCvWF/TdK92eN7JW0rpx0AjVLzMN7MXpK0QNJFZtYn6ReSnpD0azNbIemgpLsa2eSZ7sILL0zWX3755WS9yDh6revRn3322WS96Dh66p/99ttvT657xx13JOvt7e3J+uzZs3NrteaVv/XWW5P1np6eZL0V1Qy7uy/LKf2o5F4ANBBflwWCIOxAEIQdCIKwA0EQdiAILnEtwfnnn5+sv/HGG8n6ddddV2j7n3/+eW7t+uuvT65ba2ht6dKlyfqaNWuS9SlTpuTW5syZk1y3Slu3bk3WH3vssWS9t7e3zHa+l7ovcQUwNhB2IAjCDgRB2IEgCDsQBGEHgiDsQBBM2VyCzs70XbeKjqO/+OKLyfqTTz6ZW6s1jv70008n6/fdd1+yft555yXrKbUur920aVPdry1Jq1atyq3Vurz2tttuS9Y3b96crFc5zp6HPTsQBGEHgiDsQBCEHQiCsANBEHYgCMIOBME4+yhNmDAht7ZkyZKGbnvv3r3J+r59+3JrTz31VHLdouPoAwPp+UHuv//+3Npbb72VXPerr75K1mv5+uuvc2uLFy9Orpv6932mYs8OBEHYgSAIOxAEYQeCIOxAEIQdCIKwA0Ewzj5KqWujr7zyykKvvW7dumS91nXfV1xxRW5t5cqVyXUnTZqUrHd3dyfrd92Vnq374MGDyTqap+ae3cw2mdmAmfUMW7bOzPrNbE/2s6ixbQIoajSH8Zsl3TLC8n9397nZz+vltgWgbDXD7u5vSxpsQi8AGqjICbpVZrYvO8yfnPdHZtZpZt1mlv7wB6Ch6g37BkkzJc2VdEhS7l0L3b3L3ee7+/w6twWgBHWF3d2PuPtJdz8laaOka8ttC0DZ6gq7mbUPe7pUUk/e3wJoDTXH2c3sJUkLJF1kZn2SfiFpgZnNleSSDkj6aeNabI6zzkq/FR0dHbm11BzkknTs2LFk/fXX04MZF1xwQbK+Y8eO3FqtcfQ9e/Yk67Xur97X15esV+mZZ57JrZ199tmFXjt1D4FWVTPs7r5shMXPNaAXAA3E12WBIAg7EARhB4Ig7EAQhB0IgktcMydOnEjW33333dzajTfemFx34sSJyfrcuXOT9d27dyfrl156aW7tiy++SK67Zs2aZL2Vh9ZqDQvecMMNubVx48Yl133uufSAU39/f7LeitizA0EQdiAIwg4EQdiBIAg7EARhB4Ig7EAQjLNnao27zps3r2HbXrBgQbI+ffr0ZH1wMP8WgUuXLk2u+8477yTrjXT55Zcn6w8//HCyvnz58mQ99e+0pyd9C4bt27cn66dOnUrWWxF7diAIwg4EQdiBIAg7EARhB4Ig7EAQhB0IgnH2jJkl65Mn585wVdjdd99daP3Dhw/n1q655prkurXqRc2cOTO3ds899yTXrXUfgFpSY+kLFy5MrjswMFBo262IPTsQBGEHgiDsQBCEHQiCsANBEHYgCMIOBGHu3ryNmTVvYyVLTX28fv365LqrV68uux1I2r9/f7J+880359bG4jj6ae4+4pdGau7ZzWy6me0ys4/MbL+ZrcmWTzGzHWb2afa7cd86AVDYaA7jT0j6R3efLel6ST8zs9mS1kra6e6zJO3MngNoUTXD7u6H3P2D7PGXkj6WdImkDklbsj/bImlJg3oEUILv9d14M5shaZ6k30tqc/dDWemwpLacdToldRboEUAJRn023swmSnpF0kPu/ufhNR86yzfiyTd373L3+e4+v1CnAAoZVdjNbLyGgv5Ld9+aLT5iZu1ZvV3S2D29CYwBNYfebOjazy2SBt39oWHLn5T0hbs/YWZrJU1x93+q8Vpn7NBbyjnnnJOsP/roo4Vev6OjI1m/6qqrCr1+I/X29ubWUkNjUvrSXUmq9d9urWm4x6q8obfRfGa/UdLfS/rQzPZky34u6QlJvzazFZIOSrqrhD4BNEjNsLv7f0vKu7PDj8ptB0Cj8HVZIAjCDgRB2IEgCDsQBGEHguASV2CMqfsSVwBjA2EHgiDsQBCEHQiCsANBEHYgCMIOBEHYgSAIOxAEYQeCIOxAEIQdCIKwA0EQdiAIwg4EQdiBIAg7EARhB4Ig7EAQhB0IgrADQRB2IAjCDgRRM+xmNt3MdpnZR2a238zWZMvXmVm/me3JfhY1vl0A9ao5SYSZtUtqd/cPzGySpPclLdHQfOzH3P2pUW+MSSKAhsubJGI087MfknQoe/ylmX0s6ZJy2wPQaN/rM7uZzZA0T9Lvs0WrzGyfmW0ys8k563SaWbeZdRdrFUARo57rzcwmSvovSf/i7lvNrE3SUUkuab2GDvXvq/EaHMYDDZZ3GD+qsJvZeEm/kbTd3f9thPoMSb9x97+q8TqEHWiwuid2NDOT9Jykj4cHPTtxd9pSST1FmwTQOKM5G3+TpN2SPpR0Klv8c0nLJM3V0GH8AUk/zU7mpV6LPTvQYIUO48tC2IHGY352IDjCDgRB2IEgCDsQBGEHgiDsQBCEHQiCsANBEHYgCMIOBEHYgSAIOxAEYQeCIOxAEDVvOFmyo5IODnt+UbasFbVqb63al0Rv9Sqzt8vyCk29nv07Gzfrdvf5lTWQ0Kq9tWpfEr3Vq1m9cRgPBEHYgSCqDntXxdtPadXeWrUvid7q1ZTeKv3MDqB5qt6zA2gSwg4EUUnYzewWM/uDmfWa2doqeshjZgfM7MNsGupK56fL5tAbMLOeYcummNkOM/s0+z3iHHsV9dYS03gnphmv9L2revrzpn9mN7Nxkv4oaaGkPknvSVrm7h81tZEcZnZA0nx3r/wLGGb2N5KOSfqP01Nrmdm/Shp09yey/1FOdvdHW6S3dfqe03g3qLe8acb/QRW+d2VOf16PKvbs10rqdffP3P24pF9J6qigj5bn7m9LGvzW4g5JW7LHWzT0H0vT5fTWEtz9kLt/kD3+UtLpacYrfe8SfTVFFWG/RNKfhj3vU2vN9+6Sfmtm75tZZ9XNjKBt2DRbhyW1VdnMCGpO491M35pmvGXeu3qmPy+KE3TfdZO7Xy3p7yT9LDtcbUk+9BmslcZON0iaqaE5AA9JerrKZrJpxl+R9JC7/3l4rcr3boS+mvK+VRH2fknThz2fli1rCe7en/0ekPSqhj52tJIjp2fQzX4PVNzP/3P3I+5+0t1PSdqoCt+7bJrxVyT90t23Zosrf+9G6qtZ71sVYX9P0iwz+6GZ/UDSTyS9VkEf32Fm52YnTmRm50r6sVpvKurXJN2bPb5X0rYKe/mGVpnGO2+acVX83lU+/bm7N/1H0iINnZH/H0n/XEUPOX1dLmlv9rO/6t4kvaShw7r/1dC5jRWSLpS0U9Knkn4naUoL9fafGprae5+GgtVeUW83aegQfZ+kPdnPoqrfu0RfTXnf+LosEAQn6IAgCDsQBGEHgiDsQBCEHQiCsANBEHYgiP8DGU9/rioHk3AAAAAASUVORK5CYII=\n",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {
+      "needs_background": "light"
+     },
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "plt.imshow(mnist_sample[:,:], cmap='gray', interpolation='none')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "805918da-ca53-4b07-8eb8-fdcd9b100f24",
+   "metadata": {},
+   "source": [
+    "### Pytorch Modules\n",
+    "\n",
+    "PyTorch modules are the base classes of neural netorks in PyTorch. All modules we define should inherit from `torch.nn.Module`. Modules can also contain other Modules, allowing nesting.    \n",
+    "A tensor can be defined as a `Parameter` of a module.\n",
+    "Every module has a forward path defined. We add the paramter to our input and return the sum."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "e556d26d-bd10-4c01-a81b-3c317cc2349f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch.nn as nn\n",
+    "\n",
+    "class AddConstant(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(AddConstant, self).__init__()\n",
+    "        self.add_value = nn.parameter.Parameter(torch.tensor(5), requires_grad=False)\n",
+    "        \n",
+    "    def forward(self, x):\n",
+    "        y = x + self.add_value\n",
+    "        return y\n",
+    "    \n",
+    "addc = AddConstant() #we create a new addValue istance"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0339d4c7-12d9-4cc9-ad46-844be03ddbcf",
+   "metadata": {},
+   "source": [
+    "Our AddValue module has several inherited functionality\n",
+    "* The forward pass can be called by either using the call function `addv(5)` or by directly calling the forward function `addv.forward(5)`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "4fce9610-341e-407e-85c8-f11f19840ae4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Result: 10\n",
+      "[('add_value', Parameter containing:\n",
+      "tensor(5))]\n"
+     ]
+    }
+   ],
+   "source": [
+    "y = addc(5)\n",
+    "y = addc.forward(5)\n",
+    "print(f\"Result: {y}\")\n",
+    "print(list(addc.named_parameters()))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e7c17ea-f386-4d07-bbf7-63c550e2aae9",
+   "metadata": {},
+   "source": [
+    "We can load and set so-called 'state_dicts' from modules, containing all parameters (a.k.a NN weights).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "0bdf7db1-b7bf-40a4-ac48-48e42b0d4d42",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "OrderedDict([('add_value', tensor(5))])\n",
+      "Result: 9\n"
+     ]
+    }
+   ],
+   "source": [
+    "state_dict = addc.state_dict()\n",
+    "print(state_dict)\n",
+    "state_dict['add_value'] = torch.tensor(4)\n",
+    "addc.load_state_dict(state_dict)\n",
+    "print(f\"Result: {addc.forward(5)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d73ac87-508b-44cb-b8de-100a9a3c5b79",
+   "metadata": {},
+   "source": [
+    "Modules can also be pushed to the GPU for calculation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "ee84a7a3-cc48-4b0d-8bd7-d3df2303d862",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(9, device='cuda:0')\n"
+     ]
+    }
+   ],
+   "source": [
+    "addc.to('cuda')\n",
+    "y = addc(torch.tensor(5, device='cuda'))\n",
+    "print(y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aed02c67-2c53-4953-987d-bd83da9586ec",
+   "metadata": {},
+   "source": [
+    "Functions that do not have parameters can be found in `torch.nn.functional`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "e9d47d08-adf6-4bb2-ad9a-6db76b4e928e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch.nn.functional as F\n",
+    "\n",
+    "result = F.relu(torch.ones(1))\n",
+    "result = F.max_pool2d(torch.ones((10,10,10)), kernel_size=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "36c84fc2-ae6d-46c9-91cd-9b3d1482fb71",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:5414047f-c2e5-4b51-a4bf-7b6b9c562a19 tags:
+
+# Embedded ML Lab - Excercise 0 - Intro Pytorch
+
+* Documentation Pytorch: https://pytorch.org/docs/stable/index.html
+* Documentation Matplotlib: https://matplotlib.org/stable/contents.html
+
+### Tensor basics
+`PyTorch` uses _pytorch_ _tensors_ to store N-dimensional data similar to NumPy or Matlab. Torch tensors support a variety of matrix or vector operations.
+
+%% Cell type:code id:32ca05e2-f9c9-4703-b923-28f02bb337f7 tags:
+
+``` python
+import torch
+import torch
+torch.rand(1).to('cuda') #initialize cuda context (might take a while)
+x = torch.tensor([5,3]) #create variable
+y = torch.tensor([3,3])
+
+z = x * y #point-wise multiplication of two variables
+print(z)
+```
+
+%% Output
+
+    tensor([15,  9])
+
+%% Cell type:markdown id:91e9f788-ce37-4a53-a1e1-aa8fd34db305 tags:
+
+Also, there are several methods to initialize tensors like `torch.ones / torch.zeros / torch.randn`
+We can get the shape of a tensor by calling `size` on a tensor
+
+%% Cell type:code id:1b85c810-8ee9-4fbc-b520-47ba10a65a6f tags:
+
+``` python
+ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]
+rand = torch.randn((4,4)) # creates an 2-dimensional random tensor with size [4,4]
+
+print(ones.size()) # returns a python list with dimension
+```
+
+%% Output
+
+    torch.Size([10, 10, 5])
+
+%% Cell type:markdown id:f828aaec-cb84-4b16-a527-3902bc9f8a15 tags:
+
+Pytorch tensors can also have different datatypes
+
+%% Cell type:code id:d291bc8a-3c07-4d84-b226-e890f923290a tags:
+
+``` python
+torch.ones((10,10), dtype=torch.int) #inits a tensor with ones as int
+torch.ones((10,10), dtype=torch.float) #inits a tensor with ones as float (standard)
+```
+
+%% Output
+
+    tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
+
+%% Cell type:markdown id:b7c838ef-d685-4021-b9bb-6fd50b6352b0 tags:
+
+Similar to NumPy or Matlab we can also slice tensors with indices (NumPy Indexing: https://numpy.org/doc/stable/reference/arrays.indexing.html)
+Slicing is equivalent to a torch.view. As the name suggests, this does not change the underlying storage or create a copy, meaning if we change the data, all associated views also show the changes.
+
+%% Cell type:code id:abff0483-bd54-4218-9c6d-cd2cc4660ee5 tags:
+
+``` python
+ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]
+a = ones[0:5,0,0] # we create a view by slicing out index 0,1,2,3,4 from the first dimension and use : to slice all indices for dimension 2 and 3
+print(f"Size of a: {a.size()}")
+
+ones[0:5,:,:] = 3.14
+print(a)
+b = ones.clone()[0:5,0,0] #cloning a tensor creates an independent copy
+ones[0:5,:,:] = 7.11
+print(b)
+print(a)
+```
+
+%% Output
+
+    Size of a: torch.Size([5])
+    tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])
+    tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])
+    tensor([7.1100, 7.1100, 7.1100, 7.1100, 7.1100])
+
+%% Cell type:markdown id:41fdf1fa-61a6-47b4-b613-344f6249c42d tags:
+
+Other usefull tensor operations are `flatten()`, `sum()`, `max()`, `min()`.
+
+%% Cell type:code id:13c264c0-3ce1-418b-b626-643af4bcd679 tags:
+
+``` python
+a = torch.ones((10,10,10))
+a_flattened = a.flatten()
+print(f"Shape of a: {a.size()}, Shape of a_flattened: {a_flattened.size()}")
+sum_of_a = a.sum(dim=(0,1)) # sum of dimens 0 and 1
+print(f"Sum: {sum_of_a}")
+sum_of_a = a.sum(dim=(0,1,2)) #sum_of_all_entries
+print(f"Sum: {sum_of_a}")
+```
+
+%% Output
+
+    Shape of a: torch.Size([10, 10, 10]), Shape of a_flattened: torch.Size([1000])
+    Sum: tensor([100., 100., 100., 100., 100., 100., 100., 100., 100., 100.])
+    Sum: 1000.0
+
+%% Cell type:markdown id:6b3d3890-8d52-4c95-a500-2780fae86f40 tags:
+
+A very special property of pytorch tensors is that they can be pushed to a device (a GPU) and operations can be done on a GPU. This can speedup operations dramatically, if the required operations are parallelizable.
+We therefore first check if pytorch can reach the Jetsons' GPU.
+
+%% Cell type:code id:11d2bb31-e22d-4cf1-99f2-98bf17f0616e tags:
+
+``` python
+import time
+print(f'CUDA available:        {["no", "yes"][torch.cuda.is_available()]}')
+
+a = torch.zeros((10**4, 10**4))
+b = torch.zeros((10**4, 10**4))
+
+def f(device, n, k):
+    x = torch.randn(n, n, dtype=torch.float32, device=device)
+    for _ in range(k):
+        x = torch.matmul(x, x)
+        x = (x - x.mean()) / x.std()
+    return x.max()
+
+n = 256
+k = 100
+
+%timeit -n 1 -r 1 print(f('cpu',  n, k))
+%timeit -n 1 -r 1 print(f('cuda', n, k))
+%timeit -n 1 -r 1 print(f('cpu',  4*n, k))
+%timeit -n 1 -r 1 print(f('cuda', 4*n, k))
+```
+
+%% Output
+
+    CUDA available:        yes
+    tensor(11.3122)
+    6.9 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+    tensor(10.7327, device='cuda:0')
+    11.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+    tensor(10.8747)
+    20.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+    tensor(8.0055, device='cuda:0')
+    2.69 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+
+%% Cell type:markdown id:4092e33d-77d9-4203-8546-3af85181e4e9 tags:
+
+PyTorch tensors (data/nn-weights) can also be stored and loaded from disk.
+We load a sample from the MNIST dataset, which is stored as "mnist_sample.pt" on the disk.
+The MNIST Dataset consists of images of handwritten grayscale images with digits from `0-9`
+* This can be done by using `torch.load("filename")`. Similarly, we can store tensors`toch.store(tensor, "filename")`.
+
+%% Cell type:code id:4df9bb5b-40c2-49d5-aaf6-4a3aa046d4ca tags:
+
+``` python
+mnist_sample = torch.load("mnist_sample.pt") #this loads a 28 by 28 pixel image from the MNSIT dataset
+print(mnist_sample.size())
+```
+
+%% Output
+
+    torch.Size([28, 28])
+
+%% Cell type:code id:43b7dd30-19b3-492f-bdc0-20d93cfa17b6 tags:
+
+``` python
+import matplotlib.pyplot as plt
+plt.imshow(mnist_sample[:,:], cmap='gray', interpolation='none')
+```
+
+%% Output
+
+    <matplotlib.image.AxesImage at 0x7ea8f0ef60>
+
+
+
+%% Cell type:markdown id:805918da-ca53-4b07-8eb8-fdcd9b100f24 tags:
+
+### Pytorch Modules
+
+PyTorch modules are the base classes of neural netorks in PyTorch. All modules we define should inherit from `torch.nn.Module`. Modules can also contain other Modules, allowing nesting.
+A tensor can be defined as a `Parameter` of a module.
+Every module has a forward path defined. We add the paramter to our input and return the sum.
+
+%% Cell type:code id:e556d26d-bd10-4c01-a81b-3c317cc2349f tags:
+
+``` python
+import torch.nn as nn
+
+class AddConstant(nn.Module):
+    def __init__(self):
+        super(AddConstant, self).__init__()
+        self.add_value = nn.parameter.Parameter(torch.tensor(5), requires_grad=False)
+
+    def forward(self, x):
+        y = x + self.add_value
+        return y
+
+addc = AddConstant() #we create a new addValue istance
+```
+
+%% Cell type:markdown id:0339d4c7-12d9-4cc9-ad46-844be03ddbcf tags:
+
+Our AddValue module has several inherited functionality
+* The forward pass can be called by either using the call function `addv(5)` or by directly calling the forward function `addv.forward(5)`.
+
+%% Cell type:code id:4fce9610-341e-407e-85c8-f11f19840ae4 tags:
+
+``` python
+y = addc(5)
+y = addc.forward(5)
+print(f"Result: {y}")
+print(list(addc.named_parameters()))
+```
+
+%% Output
+
+    Result: 10
+    [('add_value', Parameter containing:
+    tensor(5))]
+
+%% Cell type:markdown id:3e7c17ea-f386-4d07-bbf7-63c550e2aae9 tags:
+
+We can load and set so-called 'state_dicts' from modules, containing all parameters (a.k.a NN weights).
+
+%% Cell type:code id:0bdf7db1-b7bf-40a4-ac48-48e42b0d4d42 tags:
+
+``` python
+state_dict = addc.state_dict()
+print(state_dict)
+state_dict['add_value'] = torch.tensor(4)
+addc.load_state_dict(state_dict)
+print(f"Result: {addc.forward(5)}")
+```
+
+%% Output
+
+    OrderedDict([('add_value', tensor(5))])
+    Result: 9
+
+%% Cell type:markdown id:5d73ac87-508b-44cb-b8de-100a9a3c5b79 tags:
+
+Modules can also be pushed to the GPU for calculation.
+
+%% Cell type:code id:ee84a7a3-cc48-4b0d-8bd7-d3df2303d862 tags:
+
+``` python
+addc.to('cuda')
+y = addc(torch.tensor(5, device='cuda'))
+print(y)
+```
+
+%% Output
+
+    tensor(9, device='cuda:0')
+
+%% Cell type:markdown id:aed02c67-2c53-4953-987d-bd83da9586ec tags:
+
+Functions that do not have parameters can be found in `torch.nn.functional`.
+
+%% Cell type:code id:e9d47d08-adf6-4bb2-ad9a-6db76b4e928e tags:
+
+``` python
+import torch.nn.functional as F
+
+result = F.relu(torch.ones(1))
+result = F.max_pool2d(torch.ones((10,10,10)), kernel_size=2)
+```
+
+%% Cell type:code id:36c84fc2-ae6d-46c9-91cd-9b3d1482fb71 tags:
+
+``` python
+```
--- a/exercises/0-intro/.ipynb_checkpoints/exercise_01-checkpoint.ipynb
+++ b/exercises/0-intro/.ipynb_checkpoints/exercise_01-checkpoint.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d8909361-4d07-4331-a587-be85e32a3823",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# Embedded ML Lab - Excercise 0 - Intro Inference\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2265b134-4819-4b6a-902e-9562836b055d",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "We start with a NN model similar to the LeNet model from 1989 (https://en.wikipedia.org/wiki/LeNet). The LeNet Model is designed to detect handwritten numbers from the MNIST dataset http://yann.lecun.com/exdb/mnist/with size 28x28 and outputs a vector with size 10, where each number in this vector represents the likelihood that the input corresponds to that number. All Conv layers have `stride=1` `padding=0`.\n",
+    "\n",
+    "<img src=\"src/lenet.png\" alt=\"drawing\" width=\"600\"/>\n",
+    "\n",
+    "<span style=\"color:green\">Your Tasks:</span>\n",
+    "* <span style=\"color:green\">Write the init code for the required modules to define LeNet  (Use the provided image to determine the number of input/ouput filters and kernel sizes)</span>\n",
+    "    * <span style=\"color:green\">Determine the output size of conv2 to determine the input size of fc1</span>\n",
+    "    * The size of the output conv2d layer can be determined with the following formula $H_{\\text{out}} = \\lfloor{ \\frac{H_{\\text{in}} + 2 \\times \\text{padding} - 1 \\times ( \\text{kernelsize} -1 ) -1 } {\\text{stride}} +1}\\rfloor$\n",
+    "    * Here, maxpool2d with kernel size 2 reduces the input size by factor two: $H_{\\text{out}} = \\lfloor \\frac{H_{\\text{in}}}{2}\\rfloor$\n",
+    "    * <span style=\"color:green\">Use following modules: `nn.Conv2d, nn.Linear`</span>\n",
+    "* <span style=\"color:green\">Define the forward pass of LeNet, check the provided image for the flow of data through the modules and functions</span>\n",
+    "    * <span style=\"color:green\">Use the following functions: `F.relu, F.max_pool2d, tensor.flatten`</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "guided-recognition",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "torch.rand(1).to('cuda') #initialize cuda context (might take a while)\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 88,
+   "id": "34cea594-90eb-4a07-b390-b3f332e7869c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LeNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(LeNet, self).__init__()\n",
+    "        \n",
+    "        #---to-be-done-by-student---\n",
+    "        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)\n",
+    "        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)\n",
+    "        self.fc1 = nn.Linear(in_features=400, out_features=120)\n",
+    "        self.fc2 = nn.Linear(in_features=120, out_features=84)\n",
+    "        self.fc3 = nn.Linear(in_features=84, out_features=10)\n",
+    "        \n",
+    "        #---end---------------------\n",
+    "        return\n",
+    "    \n",
+    "    def forward(self,x):\n",
+    "        #---to-be-done-by-student---\n",
+    "        x = self.conv1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = self.conv2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = x.flatten(1,3)\n",
+    "        \n",
+    "        x = self.fc1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc3(x)\n",
+    "        \n",
+    "        #---end---------------------\n",
+    "        return x"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "759ee961-4e90-498c-a316-398ec85057f4",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "We can now create a new model instance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 89,
+   "id": "13b45f83-8084-4cd9-8aa1-09ec8b83445f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net = LeNet()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4535a390-bb2f-4695-a4e2-f7c6253ac0e0",
+   "metadata": {},
+   "source": [
+    "We now load the state dict with the filename `lenet.pt` into the model. These weights are already pretrained and should have a high accuracy when detecting MNIST images. Afterwards, we check if the network is able to detect our stored sample.\n",
+    "\n",
+    "<span style=\"color:green\">Your Task:</span>\n",
+    "* <span style=\"color:green\">Load the state_dict `lenet.pt` from disk and load the state dict into the LeNet instance</span>\n",
+    "* <span style=\"color:green\">Calculate the output of the network when feeding in the image</span>\n",
+    "    * Load the image from disk (`mnist_sample.pt`) into a tensor \n",
+    "    * Note that you need to expand the dimensions of the tensor, since the network expects an input with size $N \\times 1 \\times 28 \\times 28$ but the image is size $ 28 \\times 28$. You can create two dimensions by using a slice with **[None, None, :, :]**\n",
+    "    * Check if the image is detected correctly. The output with the highest value corresponds to the estimated class (you can use `torch.argmax`)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 90,
+   "id": "a0244c61-aea6-4425-92e2-6aa0972423a7",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor(6)"
+      ]
+     },
+     "execution_count": 90,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "#---to-be-done-by-student---\n",
+    "net.load_state_dict(torch.load(\"lenet.pt\"))\n",
+    "tensor = torch.load(\"mnist_sample.pt\")[None, None, :, :]\n",
+    "output = net(tensor)\n",
+    "torch.argmax(output)\n",
+    "#---end---------------------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b84fa4c3-0e1a-4264-8fd2-f550ebed730b",
+   "metadata": {},
+   "source": [
+    "Next, we want to determine the accuracy of the network using the full MNIST test data. Additionally, we want to measure the execution time for the network on the CPU as well as on the GPU.\n",
+    "\n",
+    "* We first load the complete MNIST testset (10.000 Images), and zero-center and scale it.\n",
+    "* We create a DataLoader, which can be iterated with enumerate and returns the data in chunks of 64, so-called batches. The resulting tensor is of size $64 \\times 1 \\times 28 \\times 28$.\n",
+    "* The target tensor is of size $64$ where for each image the tensor entry is the correct label number (e.g. image shows a `inputs[8, :, :, :]` shows a two, the corresponding value in the target tensor `targets[8]` is 2.\n",
+    "\n",
+    "<span style=\"color:green\">Your Task:</span>\n",
+    "* <span style=\"color:green\">For every batch load the data into the network.</span>\n",
+    "* <span style=\"color:green\">Calculate the overall accuracy (ratio of correctly deteced images to all images).</span>\n",
+    "* <span style=\"color:green\">Calculate the overall execution time (forward pass) of the network on the cpu as well as on the gpu.</span>\n",
+    "    * <span style=\"color:green\">For GPU calculations you have to load the network as well as the input to the GPU and bring the result back to the CPU for your accuracy calculations.</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 102,
+   "id": "d1db2c7e-24c9-4511-9557-cccbb208a495",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of test images: 10000\n",
+      "Number of batches: 157\n",
+      "Batch shape: torch.Size([64, 1, 28, 28])\n",
+      "Target (Labels): tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1])\n",
+      "torch.Size([64])\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torchvision\n",
+    "import time\n",
+    "from time import perf_counter\n",
+    "\n",
+    "test_data = torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([\n",
+    "                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(\n",
+    "                                                (0.1307, ), (0.3081)) ]))\n",
+    "\n",
+    "test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)\n",
+    "print(f\"Number of test images: {len(test_data)}\")\n",
+    "print(f\"Number of batches: {len(test_loader)}\")\n",
+    "_, (inputs, targets) = next(enumerate(test_loader))\n",
+    "print(f\"Batch shape: {inputs.size()}\")\n",
+    "print(f\"Target (Labels): {targets[0:15]}\")\n",
+    "print(targets.size())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 105,
+   "id": "5e157640-fa0e-4529-9928-1b6b70c42c21",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "LenNet Accuracy is: 97.43%\n",
+      "Total time for forward pass: 6.024s\n"
+     ]
+    }
+   ],
+   "source": [
+    "device = torch.device('cuda')\n",
+    "correct_detected = 0\n",
+    "accuracy = 0\n",
+    "total_time = 0.0\n",
+    "\n",
+    "net.to(device)\n",
+    "net.eval()\n",
+    "\n",
+    "start_time = perf_counter()\n",
+    "\n",
+    "for batch_idx, (inputs, targets) in enumerate(test_loader):\n",
+    "    \n",
+    "    inputs = inputs.to(device)\n",
+    "    targets = targets.to(device)\n",
+    "    \n",
+    "    #---to-be-done-by-student---\n",
+    "    outputs = net(inputs)\n",
+    "    indexes = outputs.argmax(dim=1)\n",
+    "    num_correct = (indexes == targets).float().sum()\n",
+    "    correct_detected += num_correct\n",
+    "    #---end---------------------      \n",
+    "\n",
+    "end_time = perf_counter()\n",
+    "    \n",
+    "accuracy = correct_detected/len(test_data)\n",
+    "total_time = end_time - start_time\n",
+    "\n",
+    "print(f'LenNet Accuracy is: {accuracy:.2%}')\n",
+    "print(f'Total time for forward pass: {round(total_time, 4)}s')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "01cab8ec-2aa7-405d-af02-e06158734cb4",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:d8909361-4d07-4331-a587-be85e32a3823 tags:
+
+# Embedded ML Lab - Excercise 0 - Intro Inference
+
+%% Cell type:markdown id:2265b134-4819-4b6a-902e-9562836b055d tags:
+
+We start with a NN model similar to the LeNet model from 1989 (https://en.wikipedia.org/wiki/LeNet). The LeNet Model is designed to detect handwritten numbers from the MNIST dataset http://yann.lecun.com/exdb/mnist/with size 28x28 and outputs a vector with size 10, where each number in this vector represents the likelihood that the input corresponds to that number. All Conv layers have `stride=1` `padding=0`.
+
+<img src="src/lenet.png" alt="drawing" width="600"/>
+
+<span style="color:green">Your Tasks:</span>
+* <span style="color:green">Write the init code for the required modules to define LeNet  (Use the provided image to determine the number of input/ouput filters and kernel sizes)</span>
+    * <span style="color:green">Determine the output size of conv2 to determine the input size of fc1</span>
+    * The size of the output conv2d layer can be determined with the following formula $H_{\text{out}} = \lfloor{ \frac{H_{\text{in}} + 2 \times \text{padding} - 1 \times ( \text{kernelsize} -1 ) -1 } {\text{stride}} +1}\rfloor$
+    * Here, maxpool2d with kernel size 2 reduces the input size by factor two: $H_{\text{out}} = \lfloor \frac{H_{\text{in}}}{2}\rfloor$
+    * <span style="color:green">Use following modules: `nn.Conv2d, nn.Linear`</span>
+* <span style="color:green">Define the forward pass of LeNet, check the provided image for the flow of data through the modules and functions</span>
+    * <span style="color:green">Use the following functions: `F.relu, F.max_pool2d, tensor.flatten`</span>
+
+%% Cell type:code id:guided-recognition tags:
+
+``` python
+import torch
+torch.rand(1).to('cuda') #initialize cuda context (might take a while)
+import torch.nn as nn
+import torch.nn.functional as F
+```
+
+%% Cell type:code id:34cea594-90eb-4a07-b390-b3f332e7869c tags:
+
+``` python
+class LeNet(nn.Module):
+    def __init__(self):
+        super(LeNet, self).__init__()
+
+        #---to-be-done-by-student---
+        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)
+        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)
+        self.fc1 = nn.Linear(in_features=400, out_features=120)
+        self.fc2 = nn.Linear(in_features=120, out_features=84)
+        self.fc3 = nn.Linear(in_features=84, out_features=10)
+
+        #---end---------------------
+        return
+
+    def forward(self,x):
+        #---to-be-done-by-student---
+        x = self.conv1(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = self.conv2(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = x.flatten(1,3)
+
+        x = self.fc1(x)
+        x = F.relu(x)
+        x = self.fc2(x)
+        x = F.relu(x)
+        x = self.fc3(x)
+
+        #---end---------------------
+        return x
+```
+
+%% Cell type:markdown id:759ee961-4e90-498c-a316-398ec85057f4 tags:
+
+We can now create a new model instance
+
+%% Cell type:code id:13b45f83-8084-4cd9-8aa1-09ec8b83445f tags:
+
+``` python
+net = LeNet()
+```
+
+%% Cell type:markdown id:4535a390-bb2f-4695-a4e2-f7c6253ac0e0 tags:
+
+We now load the state dict with the filename `lenet.pt` into the model. These weights are already pretrained and should have a high accuracy when detecting MNIST images. Afterwards, we check if the network is able to detect our stored sample.
+
+<span style="color:green">Your Task:</span>
+* <span style="color:green">Load the state_dict `lenet.pt` from disk and load the state dict into the LeNet instance</span>
+* <span style="color:green">Calculate the output of the network when feeding in the image</span>
+    * Load the image from disk (`mnist_sample.pt`) into a tensor
+    * Note that you need to expand the dimensions of the tensor, since the network expects an input with size $N \times 1 \times 28 \times 28$ but the image is size $ 28 \times 28$. You can create two dimensions by using a slice with **[None, None, :, :]**
+    * Check if the image is detected correctly. The output with the highest value corresponds to the estimated class (you can use `torch.argmax`)
+
+%% Cell type:code id:a0244c61-aea6-4425-92e2-6aa0972423a7 tags:
+
+``` python
+#---to-be-done-by-student---
+net.load_state_dict(torch.load("lenet.pt"))
+tensor = torch.load("mnist_sample.pt")[None, None, :, :]
+output = net(tensor)
+torch.argmax(output)
+#---end---------------------
+```
+
+%% Output
+
+    tensor(6)
+
+%% Cell type:markdown id:b84fa4c3-0e1a-4264-8fd2-f550ebed730b tags:
+
+Next, we want to determine the accuracy of the network using the full MNIST test data. Additionally, we want to measure the execution time for the network on the CPU as well as on the GPU.
+
+* We first load the complete MNIST testset (10.000 Images), and zero-center and scale it.
+* We create a DataLoader, which can be iterated with enumerate and returns the data in chunks of 64, so-called batches. The resulting tensor is of size $64 \times 1 \times 28 \times 28$.
+* The target tensor is of size $64$ where for each image the tensor entry is the correct label number (e.g. image shows a `inputs[8, :, :, :]` shows a two, the corresponding value in the target tensor `targets[8]` is 2.
+
+<span style="color:green">Your Task:</span>
+* <span style="color:green">For every batch load the data into the network.</span>
+* <span style="color:green">Calculate the overall accuracy (ratio of correctly deteced images to all images).</span>
+* <span style="color:green">Calculate the overall execution time (forward pass) of the network on the cpu as well as on the gpu.</span>
+    * <span style="color:green">For GPU calculations you have to load the network as well as the input to the GPU and bring the result back to the CPU for your accuracy calculations.</span>
+
+%% Cell type:code id:d1db2c7e-24c9-4511-9557-cccbb208a495 tags:
+
+``` python
+import torchvision
+import time
+from time import perf_counter
+
+test_data = torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([
+                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(
+                                                (0.1307, ), (0.3081)) ]))
+
+test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)
+print(f"Number of test images: {len(test_data)}")
+print(f"Number of batches: {len(test_loader)}")
+_, (inputs, targets) = next(enumerate(test_loader))
+print(f"Batch shape: {inputs.size()}")
+print(f"Target (Labels): {targets[0:15]}")
+print(targets.size())
+```
+
+%% Output
+
+    Number of test images: 10000
+    Number of batches: 157
+    Batch shape: torch.Size([64, 1, 28, 28])
+    Target (Labels): tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1])
+    torch.Size([64])
+
+%% Cell type:code id:5e157640-fa0e-4529-9928-1b6b70c42c21 tags:
+
+``` python
+device = torch.device('cuda')
+correct_detected = 0
+accuracy = 0
+total_time = 0.0
+
+net.to(device)
+net.eval()
+
+start_time = perf_counter()
+
+for batch_idx, (inputs, targets) in enumerate(test_loader):
+
+    inputs = inputs.to(device)
+    targets = targets.to(device)
+
+    #---to-be-done-by-student---
+    outputs = net(inputs)
+    indexes = outputs.argmax(dim=1)
+    num_correct = (indexes == targets).float().sum()
+    correct_detected += num_correct
+    #---end---------------------
+
+end_time = perf_counter()
+
+accuracy = correct_detected/len(test_data)
+total_time = end_time - start_time
+
+print(f'LenNet Accuracy is: {accuracy:.2%}')
+print(f'Total time for forward pass: {round(total_time, 4)}s')
+```
+
+%% Output
+
+    LenNet Accuracy is: 97.43%
+    Total time for forward pass: 6.024s
+
+%% Cell type:code id:01cab8ec-2aa7-405d-af02-e06158734cb4 tags:
+
+``` python
+```
--- a/exercises/0-intro/.ipynb_checkpoints/exercise_02-checkpoint.ipynb
+++ b/exercises/0-intro/.ipynb_checkpoints/exercise_02-checkpoint.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5541fea7-a455-4282-a096-b48a594ec530",
+   "metadata": {},
+   "source": [
+    "# Embedded ML Lab - Excercise 0 - Intro Training\n",
+    "\n",
+    "Now that we have covered Neural Network Inference, we come to the training of neural networks. We reuse the LeNet from the previous exercise.\n",
+    "\n",
+    "<span style=\"color:green\">Your Task:</span>\n",
+    "* <span style=\"color:green\">Copy your implementation of the LeNet from the last excercise into Cell1</span>\n",
+    "\n",
+    "In Cell 2 the dataset is already prepared as a dataloader (using batch_size 32). Additionally, the images are already zero-centered and normalized. We have two separate data_loaders: `test_loader` for testing the accuracy of the model, and `train_data` for training the model. These two should not be mixed for their tasks. You can iterate over the batches of a dataloader by using `for idx, (input, targets) in enumerate(dataloader):`\n",
+    "\n",
+    "Before we start with training, we need to write two functions. The first is `correct_predictions(outputs, targets)`, where you can reuse code from exercise_00. This function takes the outputs and targets as input and returns an int with the number of correct predictions in the batch. \n",
+    "The second function `test_net(net, device)`. This function iterates over the testloader, applies the network's forward pass, and returns the overall accuracy of the model (all correct predictions of the testset overall testset predictions)\n",
+    "\n",
+    "<span style=\"color:green\">Your Tasks:</span>\n",
+    "* <span style=\"color:green\">Implement the `correct_predictions` function (Cell 3)</span>\n",
+    "* <span style=\"color:green\">Implement the `test_net` function (Cell 4)</span>\n",
+    "    * <span style=\"color:green\">First set the network in evaluation mode with `.eval()` </span>\n",
+    "    * <span style=\"color:green\">Iterate over the batches in the dataloader</span>\n",
+    "    * <span style=\"color:green\">For each batch calculate the correct detected images</span>\n",
+    "    * NOTE: you can also only iterate over a fraction of batches to save some time\n",
+    "    * <span style=\"color:green\">Return the overall Accuracy</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "35f3654e-6a85-42db-84f3-cb7dca8f663f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "#---to-be-done-by-student---\n",
+    "class LeNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(LeNet, self).__init__()\n",
+    "        \n",
+    "        #---to-be-done-by-student---\n",
+    "        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)\n",
+    "        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)\n",
+    "        self.fc1 = nn.Linear(in_features=400, out_features=120)\n",
+    "        self.fc2 = nn.Linear(in_features=120, out_features=84)\n",
+    "        self.fc3 = nn.Linear(in_features=84, out_features=10)\n",
+    "        \n",
+    "        #---end---------------------\n",
+    "        return\n",
+    "    \n",
+    "    def forward(self,x):\n",
+    "        #---to-be-done-by-student---\n",
+    "        x = self.conv1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = self.conv2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = x.flatten(1,3)\n",
+    "        \n",
+    "        x = self.fc1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc3(x)\n",
+    "        \n",
+    "        #---end---------------------\n",
+    "        return x\n",
+    "    \n",
+    "#---end---------------------\n",
+    "net = LeNet()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "05e97a9a-6357-4f9e-9c00-34034ebe6515",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torchvision\n",
+    "import time\n",
+    "\n",
+    "test_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([\n",
+    "                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(\n",
+    "                                                (0.1307, ), (0.3081)) ])), batch_size=64, shuffle=False, drop_last=True)\n",
+    "\n",
+    "train_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST('.', train=True, download=True, transform=torchvision.transforms.Compose([\n",
+    "                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(\n",
+    "                                                (0.1307, ), (0.3081)) ])), batch_size=64, shuffle=False, drop_last=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "novel-singles",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def correct_predictions(outputs, targets):\n",
+    "    correct_predictions = 0\n",
+    "    #---to-be-done-by-student---\n",
+    "    indexes = outputs.argmax(dim=1)\n",
+    "    num_correct = (indexes == targets).float().sum()\n",
+    "    #---end---------------------\n",
+    "    return correct_predictions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "rental-resident",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def test_net(net, device):\n",
+    "    #---to-be-done-by-student---\n",
+    "    correct_detected = 0\n",
+    "    overall = len(test_data)\n",
+    "    \n",
+    "    net.to(device)\n",
+    "    net.eval()\n",
+    "\n",
+    "    for batch_idx, (inputs, targets) in enumerate(test_loader):\n",
+    "\n",
+    "        inputs = inputs.to(device)\n",
+    "        targets = targets.to(device)\n",
+    "\n",
+    "        #---to-be-done-by-student---\n",
+    "        outputs = net(inputs)\n",
+    "        \n",
+    "        correct_detected += correct_predictions(outputs, targets)\n",
+    "        #---end---------------------      \n",
+    "\n",
+    "    #---end---------------------\n",
+    "    return float(correct_detected/overall)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "industrial-scout",
+   "metadata": {},
+   "source": [
+    "Now that we have these two helper functions we come to training the network. Some parts are already given. You can do the training either on the cpu or on the gpu (gpu should be faster).\n",
+    "\n",
+    "First we define an optimizer `optimizer = torch.optim.SGD(net.parameters(), lr=0.01)` and hand in the model's parameters (e.g., weights and biases of the conv and linear layers). Besides the model's parameters, we set the learning rate to 0.01. The learning defines the step size for updating the parameters based on their gradients.\n",
+    "\n",
+    "Also, we require a loss function `loss_function = nn.CrossEntropyLoss()`, which defines the error (loss) between the output of the network and the desired output target.\n",
+    "To train the network we iterate over the dataset several times (for 5 epochs).\n",
+    "\n",
+    "We can split the training into five parts (for each training batch):   \n",
+    "<span style=\"color:green\">Your Tasks:</span>   \n",
+    "* <span style=\"color:green\">**Clean old gradients**: Remove the previous gradients of the parameters by calling `optimizer.zero_grad()`.</span>   \n",
+    "* <span style=\"color:green\">**Forward Pass**: Similar to the previous inference experiments, \n",
+    "    calculate the network's output.</span>   \n",
+    "* <span style=\"color:green\">**Loss**: Calculate the loss by using `loss_function(outputs, targets)`.</span>   \n",
+    "* <span style=\"color:green\">**Backpropagation of the error**: Call `.backward()` on the loss tensor and Pytorch will automatically calculate the respective gradients of the modules with respect to the input and parameters.</span>\n",
+    "* <span style=\"color:green\">**Step**: As the last step, modify the parameters based on their gradients by calling `optimizer.step()`.</span>\n",
+    "\n",
+    "Plotting the accuracy and loss of the model:\n",
+    "\n",
+    "<span style=\"color:green\">Your Tasks:</span> \n",
+    "* <span style=\"color:green\">Collect the network's loss for each batch.\n",
+    "* <span style=\"color:green\">After every 100 batches calculate the networks average_loss (of last 100 batches).</span>\n",
+    "* <span style=\"color:green\">Similar, calculate the models accuracy using using your defined `test_net` function.</span>\n",
+    "* <span style=\"color:green\">Append the average loss and the accuracy to `loss_list` and `acc_list`, respectively.</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "15aea393-c5f3-4f84-b952-3334fdc566f9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(2.3265, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2949, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2863, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2697, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2264, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.0333, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(1.2253, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.7319, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.5572, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.5634, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.4135, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.3941, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.3412, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.3358, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2804, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2822, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2777, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2267, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2748, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2293, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2099, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1875, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1931, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1664, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1687, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1736, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1413, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1757, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1561, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1376, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1224, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1336, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1185, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1204, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1294, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1055, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1324, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1177, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1058, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0917, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1052, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0943, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0970, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1075, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0874, device='cuda:0', grad_fn=<DivBackward0>)\n"
+     ]
+    }
+   ],
+   "source": [
+    "n_epochs = 5\n",
+    "loss_list = []\n",
+    "acc_list = []\n",
+    "\n",
+    "net = LeNet()\n",
+    "device = torch.device('cuda')\n",
+    "\n",
+    "optimizer = torch.optim.SGD(net.parameters(), lr=0.01)\n",
+    "#optimizer = torch.optim.Adam(net.parameters(), lr=0.1)\n",
+    "loss_function = nn.CrossEntropyLoss()\n",
+    "\n",
+    "#---to-be-done-by-student---\n",
+    "net = net.to(device)\n",
+    "total_loss = 0\n",
+    "#---end---------------------\n",
+    "\n",
+    "for epoch_n in range(n_epochs):\n",
+    "    for batch_idx, (inputs, targets) in enumerate(train_loader):\n",
+    "        \n",
+    "        #---to-be-done-by-student---\n",
+    "        optimizer.zero_grad()\n",
+    "        \n",
+    "        inputs = inputs.to(device)\n",
+    "        targets = targets.to(device)\n",
+    "\n",
+    "        outputs = net(inputs)\n",
+    "        \n",
+    "        loss = loss_function(outputs, targets).float()\n",
+    "        total_loss += loss\n",
+    "        \n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
+    "        #---end---------------------\n",
+    "        if batch_idx % 100 == 0 and batch_idx != 0:\n",
+    "            \n",
+    "            #---to-be-done-by-student---\n",
+    "            average_loss = total_loss / 100\n",
+    "            loss_list.append(average_loss)\n",
+    "            print(average_loss)\n",
+    "            total_loss = 0\n",
+    "            #---end---------------------"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "armed-counter",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "\n",
+    "plt.rcParams['figure.figsize'] = [10, 5]\n",
+    "\n",
+    "fig, ax = plt.subplots(1)\n",
+    "\n",
+    "ax.plot(np.array(acc_list), color='tab:blue')\n",
+    "ax.set_xlabel('mini-batch steps (100)')\n",
+    "ax.set_ylabel('LeNet accuracy')\n",
+    "ax.tick_params(colors='tab:blue', axis='y')\n",
+    "\n",
+    "ax2 = ax.twinx()\n",
+    "ax2.plot(np.array(loss_list), color='tab:red')\n",
+    "ax2.set_ylabel('LeNet loss')\n",
+    "ax2.tick_params(colors='tab:red', axis='y')\n",
+    "ax.set_title('LeNet training')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dental-lounge",
+   "metadata": {},
+   "source": [
+    "You can save your training state by using `state_dict = net.state_dict()` and `torch.save(state_dict, 'lenet_new.pt')`\n",
+    "\n",
+    "<span style=\"color:green\">Your Task:</span>   \n",
+    "* <span style=\"color:green\">Save the state dict of the model with a new name and plug it into exercise 01 by changing the file name in Cell 9</span>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cosmetic-router",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#save model here\n",
+    "#---to-be-done-by-student---\n",
+    "\n",
+    "#---end---------------------"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:5541fea7-a455-4282-a096-b48a594ec530 tags:
+
+# Embedded ML Lab - Excercise 0 - Intro Training
+
+Now that we have covered Neural Network Inference, we come to the training of neural networks. We reuse the LeNet from the previous exercise.
+
+<span style="color:green">Your Task:</span>
+* <span style="color:green">Copy your implementation of the LeNet from the last excercise into Cell1</span>
+
+In Cell 2 the dataset is already prepared as a dataloader (using batch_size 32). Additionally, the images are already zero-centered and normalized. We have two separate data_loaders: `test_loader` for testing the accuracy of the model, and `train_data` for training the model. These two should not be mixed for their tasks. You can iterate over the batches of a dataloader by using `for idx, (input, targets) in enumerate(dataloader):`
+
+Before we start with training, we need to write two functions. The first is `correct_predictions(outputs, targets)`, where you can reuse code from exercise_00. This function takes the outputs and targets as input and returns an int with the number of correct predictions in the batch.
+The second function `test_net(net, device)`. This function iterates over the testloader, applies the network's forward pass, and returns the overall accuracy of the model (all correct predictions of the testset overall testset predictions)
+
+<span style="color:green">Your Tasks:</span>
+* <span style="color:green">Implement the `correct_predictions` function (Cell 3)</span>
+* <span style="color:green">Implement the `test_net` function (Cell 4)</span>
+    * <span style="color:green">First set the network in evaluation mode with `.eval()` </span>
+    * <span style="color:green">Iterate over the batches in the dataloader</span>
+    * <span style="color:green">For each batch calculate the correct detected images</span>
+    * NOTE: you can also only iterate over a fraction of batches to save some time
+    * <span style="color:green">Return the overall Accuracy</span>
+
+%% Cell type:code id:35f3654e-6a85-42db-84f3-cb7dca8f663f tags:
+
+``` python
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+#---to-be-done-by-student---
+class LeNet(nn.Module):
+    def __init__(self):
+        super(LeNet, self).__init__()
+
+        #---to-be-done-by-student---
+        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)
+        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)
+        self.fc1 = nn.Linear(in_features=400, out_features=120)
+        self.fc2 = nn.Linear(in_features=120, out_features=84)
+        self.fc3 = nn.Linear(in_features=84, out_features=10)
+
+        #---end---------------------
+        return
+
+    def forward(self,x):
+        #---to-be-done-by-student---
+        x = self.conv1(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = self.conv2(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = x.flatten(1,3)
+
+        x = self.fc1(x)
+        x = F.relu(x)
+        x = self.fc2(x)
+        x = F.relu(x)
+        x = self.fc3(x)
+
+        #---end---------------------
+        return x
+
+#---end---------------------
+net = LeNet()
+```
+
+%% Cell type:code id:05e97a9a-6357-4f9e-9c00-34034ebe6515 tags:
+
+``` python
+import torchvision
+import time
+
+test_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([
+                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(
+                                                (0.1307, ), (0.3081)) ])), batch_size=64, shuffle=False, drop_last=True)
+
+train_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST('.', train=True, download=True, transform=torchvision.transforms.Compose([
+                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(
+                                                (0.1307, ), (0.3081)) ])), batch_size=64, shuffle=False, drop_last=True)
+```
+
+%% Cell type:code id:novel-singles tags:
+
+``` python
+def correct_predictions(outputs, targets):
+    correct_predictions = 0
+    #---to-be-done-by-student---
+    indexes = outputs.argmax(dim=1)
+    num_correct = (indexes == targets).float().sum()
+    #---end---------------------
+    return correct_predictions
+```
+
+%% Cell type:code id:rental-resident tags:
+
+``` python
+def test_net(net, device):
+    #---to-be-done-by-student---
+    correct_detected = 0
+    overall = len(test_data)
+
+    net.to(device)
+    net.eval()
+
+    for batch_idx, (inputs, targets) in enumerate(test_loader):
+
+        inputs = inputs.to(device)
+        targets = targets.to(device)
+
+        #---to-be-done-by-student---
+        outputs = net(inputs)
+
+        correct_detected += correct_predictions(outputs, targets)
+        #---end---------------------
+
+    #---end---------------------
+    return float(correct_detected/overall)
+```
+
+%% Cell type:markdown id:industrial-scout tags:
+
+Now that we have these two helper functions we come to training the network. Some parts are already given. You can do the training either on the cpu or on the gpu (gpu should be faster).
+
+First we define an optimizer `optimizer = torch.optim.SGD(net.parameters(), lr=0.01)` and hand in the model's parameters (e.g., weights and biases of the conv and linear layers). Besides the model's parameters, we set the learning rate to 0.01. The learning defines the step size for updating the parameters based on their gradients.
+
+Also, we require a loss function `loss_function = nn.CrossEntropyLoss()`, which defines the error (loss) between the output of the network and the desired output target.
+To train the network we iterate over the dataset several times (for 5 epochs).
+
+We can split the training into five parts (for each training batch):
+<span style="color:green">Your Tasks:</span>
+* <span style="color:green">**Clean old gradients**: Remove the previous gradients of the parameters by calling `optimizer.zero_grad()`.</span>
+* <span style="color:green">**Forward Pass**: Similar to the previous inference experiments,
+    calculate the network's output.</span>
+* <span style="color:green">**Loss**: Calculate the loss by using `loss_function(outputs, targets)`.</span>
+* <span style="color:green">**Backpropagation of the error**: Call `.backward()` on the loss tensor and Pytorch will automatically calculate the respective gradients of the modules with respect to the input and parameters.</span>
+* <span style="color:green">**Step**: As the last step, modify the parameters based on their gradients by calling `optimizer.step()`.</span>
+
+Plotting the accuracy and loss of the model:
+
+<span style="color:green">Your Tasks:</span>
+* <span style="color:green">Collect the network's loss for each batch.
+* <span style="color:green">After every 100 batches calculate the networks average_loss (of last 100 batches).</span>
+* <span style="color:green">Similar, calculate the models accuracy using using your defined `test_net` function.</span>
+* <span style="color:green">Append the average loss and the accuracy to `loss_list` and `acc_list`, respectively.</span>
+
+%% Cell type:code id:15aea393-c5f3-4f84-b952-3334fdc566f9 tags:
+
+``` python
+n_epochs = 5
+loss_list = []
+acc_list = []
+
+net = LeNet()
+device = torch.device('cuda')
+
+optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
+#optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
+loss_function = nn.CrossEntropyLoss()
+
+#---to-be-done-by-student---
+net = net.to(device)
+total_loss = 0
+#---end---------------------
+
+for epoch_n in range(n_epochs):
+    for batch_idx, (inputs, targets) in enumerate(train_loader):
+
+        #---to-be-done-by-student---
+        optimizer.zero_grad()
+
+        inputs = inputs.to(device)
+        targets = targets.to(device)
+
+        outputs = net(inputs)
+
+        loss = loss_function(outputs, targets).float()
+        total_loss += loss
+
+        loss.backward()
+        optimizer.step()
+        #---end---------------------
+        if batch_idx % 100 == 0 and batch_idx != 0:
+
+            #---to-be-done-by-student---
+            average_loss = total_loss / 100
+            loss_list.append(average_loss)
+            print(average_loss)
+            total_loss = 0
+            #---end---------------------
+```
+
+%% Output
+
+    tensor(2.3265, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2949, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2863, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2697, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2264, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.0333, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(1.2253, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.7319, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.5572, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.5634, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.4135, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.3941, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.3412, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.3358, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2804, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2822, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2777, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2267, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2748, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2293, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2099, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1875, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1931, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1664, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1687, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1736, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1413, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1757, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1561, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1376, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1224, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1336, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1185, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1204, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1294, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1055, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1324, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1177, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1058, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0917, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1052, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0943, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0970, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1075, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0874, device='cuda:0', grad_fn=<DivBackward0>)
+
+%% Cell type:code id:armed-counter tags:
+
+``` python
+import matplotlib.pyplot as plt
+import numpy as np
+
+plt.rcParams['figure.figsize'] = [10, 5]
+
+fig, ax = plt.subplots(1)
+
+ax.plot(np.array(acc_list), color='tab:blue')
+ax.set_xlabel('mini-batch steps (100)')
+ax.set_ylabel('LeNet accuracy')
+ax.tick_params(colors='tab:blue', axis='y')
+
+ax2 = ax.twinx()
+ax2.plot(np.array(loss_list), color='tab:red')
+ax2.set_ylabel('LeNet loss')
+ax2.tick_params(colors='tab:red', axis='y')
+ax.set_title('LeNet training')
+```
+
+%% Cell type:markdown id:dental-lounge tags:
+
+You can save your training state by using `state_dict = net.state_dict()` and `torch.save(state_dict, 'lenet_new.pt')`
+
+<span style="color:green">Your Task:</span>
+* <span style="color:green">Save the state dict of the model with a new name and plug it into exercise 01 by changing the file name in Cell 9</span>
+
+%% Cell type:code id:cosmetic-router tags:
+
+``` python
+#save model here
+#---to-be-done-by-student---
+
+#---end---------------------
+```
--- a/exercises/0-intro/exercise_00.ipynb
+++ b/exercises/0-intro/exercise_00.ipynb
@@ -18,12 +18,20 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 8,
   "id": "32ca05e2-f9c9-4703-b923-28f02bb337f7",
   "metadata": {
    "tags": []
   },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([15,  9])\n"
+     ]
+    }
+   ],
   "source": [
    "import torch\n",
    "import torch\n",
@@ -46,10 +54,18 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 11,
   "id": "1b85c810-8ee9-4fbc-b520-47ba10a65a6f",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([10, 10, 5])\n"
+     ]
+    }
+   ],
   "source": [
    "ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]\n",
    "rand = torch.randn((4,4)) # creates an 2-dimensional random tensor with size [4,4]\n",
@@ -67,10 +83,30 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 12,
   "id": "d291bc8a-3c07-4d84-b226-e890f923290a",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n",
+       "        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "torch.ones((10,10), dtype=torch.int) #inits a tensor with ones as int\n",
    "torch.ones((10,10), dtype=torch.float) #inits a tensor with ones as float (standard)"
@@ -87,12 +123,23 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 13,
   "id": "abff0483-bd54-4218-9c6d-cd2cc4660ee5",
   "metadata": {
    "tags": []
   },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Size of a: torch.Size([5])\n",
+      "tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])\n",
+      "tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])\n",
+      "tensor([7.1100, 7.1100, 7.1100, 7.1100, 7.1100])\n"
+     ]
+    }
+   ],
   "source": [
    "ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]\n",
    "a = ones[0:5,0,0] # we create a view by slicing out index 0,1,2,3,4 from the first dimension and use : to slice all indices for dimension 2 and 3\n",
@@ -116,10 +163,20 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 16,
   "id": "13c264c0-3ce1-418b-b626-643af4bcd679",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Shape of a: torch.Size([10, 10, 10]), Shape of a_flattened: torch.Size([1000])\n",
+      "Sum: tensor([100., 100., 100., 100., 100., 100., 100., 100., 100., 100.])\n",
+      "Sum: 1000.0\n"
+     ]
+    }
+   ],
   "source": [
    "a = torch.ones((10,10,10))\n",
    "a_flattened = a.flatten()\n",
@@ -141,10 +198,26 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 17,
   "id": "11d2bb31-e22d-4cf1-99f2-98bf17f0616e",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CUDA available:        yes\n",
+      "tensor(11.3122)\n",
+      "6.9 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n",
+      "tensor(10.7327, device='cuda:0')\n",
+      "11.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n",
+      "tensor(10.8747)\n",
+      "20.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n",
+      "tensor(8.0055, device='cuda:0')\n",
+      "2.69 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)\n"
+     ]
+    }
+   ],
   "source": [
    "import time\n",
    "print(f'CUDA available:        {[\"no\", \"yes\"][torch.cuda.is_available()]}')\n",
@@ -183,12 +256,20 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 18,
   "id": "4df9bb5b-40c2-49d5-aaf6-4a3aa046d4ca",
   "metadata": {
    "tags": []
   },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([28, 28])\n"
+     ]
+    }
+   ],
   "source": [
    "mnist_sample = torch.load(\"mnist_sample.pt\") #this loads a 28 by 28 pixel image from the MNSIT dataset\n",
    "print(mnist_sample.size())"
@@ -196,10 +277,33 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 21,
   "id": "43b7dd30-19b3-492f-bdc0-20d93cfa17b6",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<matplotlib.image.AxesImage at 0x7ea8f0ef60>"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAOjUlEQVR4nO3df4wUdZrH8c9zyBIVVFBvMgjKivgHnDkw+CNqLlxWNh4mDPgri+T0IjirWQiaiyfumSwJucSceud/JIMi3Lm6WSMGs1FZFsnJabJxNICDuuucQnYmwATnj5WYyAHP/THFZdSpb49d1V3NPO9XMpnueqa6Hls/VnV9u+pr7i4AY99fVN0AgOYg7EAQhB0IgrADQRB2IIizmrkxM+PUP9Bg7m4jLS+0ZzezW8zsD2bWa2Zri7wWgMayesfZzWycpD9KWiipT9J7kpa5+0eJddizAw3WiD37tZJ63f0zdz8u6VeSOgq8HoAGKhL2SyT9adjzvmzZN5hZp5l1m1l3gW0BKKjhJ+jcvUtSl8RhPFClInv2fknThz2fli0D0IKKhP09SbPM7Idm9gNJP5H0WjltAShb3Yfx7n7CzFZJ2i5pnKRN7r6/tM4AlKruobe6NsZndqDhGvKlGgBnDsIOBEHYgSAIOxAEYQeCIOxAEE29nh3xzJ49O7e2bt265Lp33nlnsn7y5MlkfePGjbm1Bx98MLnuWMSeHQiCsANBEHYgCMIOBEHYgSAIOxAEQ28oZM6cOcn6m2++mVubOnVqct3jx48n6ytWrEjWX3jhhWQ9GvbsQBCEHQiCsANBEHYgCMIOBEHYgSAIOxAEd5dF0mWXXZas79q1q9D6KStXrkzWn3/++bpfeyzj7rJAcIQdCIKwA0EQdiAIwg4EQdiBIAg7EATXswc3Y8aMZP2RRx5J1sePH1/3tvfu3Zusb9u2re7XxncVCruZHZD0paSTkk64+/wymgJQvjL27H/r7kdLeB0ADcRndiCIomF3Sb81s/fNrHOkPzCzTjPrNrPugtsCUEDRw/ib3L3fzP5S0g4z+8Td3x7+B+7eJalL4kIYoEqF9uzu3p/9HpD0qqRry2gKQPnqDruZnWtmk04/lvRjST1lNQagXEUO49skvWpmp1/nRXfPv0k4KjFhwoRkffXq1cn6Aw88UGj7u3fvzq2tX78+ue7g4GChbeOb6g67u38m6a9L7AVAAzH0BgRB2IEgCDsQBGEHgiDsQBDcSnoMmDZtWm5tw4YNyXUXLVpUaNt9fX3J+uLFi3NrtS5xRX24lTQQHGEHgiDsQBCEHQiCsANBEHYgCMIOBMGtpMeAWbNm5daKjqPXsnz58mSdsfTWwZ4dCIKwA0EQdiAIwg4EQdiBIAg7EARhB4JgnP0McPXVVyfrW7Zsadi2d+7cmax/8sknDds2ysWeHQiCsANBEHYgCMIOBEHYgSAIOxAEYQeC4L7xLaCtrS1Z7+7uTtanTp1a97Y3btyYrD/++OPJ+tGjR+vedlEXX3xxsj5z5szc2uHDh5PrHjhwoJ6WWkLd9403s01mNmBmPcOWTTGzHWb2afZ7cpnNAijfaA7jN0u65VvL1kra6e6zJO3MngNoYTXD7u5vSxr81uIOSae/o7lF0pJy2wJQtnq/G9/m7oeyx4cl5X7oNLNOSZ11bgdASQpfCOPunjrx5u5dkrokTtABVap36O2ImbVLUvZ7oLyWADRCvWF/TdK92eN7JW0rpx0AjVLzMN7MXpK0QNJFZtYn6ReSnpD0azNbIemgpLsa2eSZ7sILL0zWX3755WS9yDh6revRn3322WS96Dh66p/99ttvT657xx13JOvt7e3J+uzZs3NrteaVv/XWW5P1np6eZL0V1Qy7uy/LKf2o5F4ANBBflwWCIOxAEIQdCIKwA0EQdiAILnEtwfnnn5+sv/HGG8n6ddddV2j7n3/+eW7t+uuvT65ba2ht6dKlyfqaNWuS9SlTpuTW5syZk1y3Slu3bk3WH3vssWS9t7e3zHa+l7ovcQUwNhB2IAjCDgRB2IEgCDsQBGEHgiDsQBBM2VyCzs70XbeKjqO/+OKLyfqTTz6ZW6s1jv70008n6/fdd1+yft555yXrKbUur920aVPdry1Jq1atyq3Vurz2tttuS9Y3b96crFc5zp6HPTsQBGEHgiDsQBCEHQiCsANBEHYgCMIOBME4+yhNmDAht7ZkyZKGbnvv3r3J+r59+3JrTz31VHLdouPoAwPp+UHuv//+3Npbb72VXPerr75K1mv5+uuvc2uLFy9Orpv6932mYs8OBEHYgSAIOxAEYQeCIOxAEIQdCIKwA0Ewzj5KqWujr7zyykKvvW7dumS91nXfV1xxRW5t5cqVyXUnTZqUrHd3dyfrd92Vnq374MGDyTqap+ae3cw2mdmAmfUMW7bOzPrNbE/2s6ixbQIoajSH8Zsl3TLC8n9397nZz+vltgWgbDXD7u5vSxpsQi8AGqjICbpVZrYvO8yfnPdHZtZpZt1mlv7wB6Ch6g37BkkzJc2VdEhS7l0L3b3L3ee7+/w6twWgBHWF3d2PuPtJdz8laaOka8ttC0DZ6gq7mbUPe7pUUk/e3wJoDTXH2c3sJUkLJF1kZn2SfiFpgZnNleSSDkj6aeNabI6zzkq/FR0dHbm11BzkknTs2LFk/fXX04MZF1xwQbK+Y8eO3FqtcfQ9e/Yk67Xur97X15esV+mZZ57JrZ199tmFXjt1D4FWVTPs7r5shMXPNaAXAA3E12WBIAg7EARhB4Ig7EAQhB0IgktcMydOnEjW33333dzajTfemFx34sSJyfrcuXOT9d27dyfrl156aW7tiy++SK67Zs2aZL2Vh9ZqDQvecMMNubVx48Yl133uufSAU39/f7LeitizA0EQdiAIwg4EQdiBIAg7EARhB4Ig7EAQjLNnao27zps3r2HbXrBgQbI+ffr0ZH1wMP8WgUuXLk2u+8477yTrjXT55Zcn6w8//HCyvnz58mQ99e+0pyd9C4bt27cn66dOnUrWWxF7diAIwg4EQdiBIAg7EARhB4Ig7EAQhB0IgnH2jJkl65Mn585wVdjdd99daP3Dhw/n1q655prkurXqRc2cOTO3ds899yTXrXUfgFpSY+kLFy5MrjswMFBo262IPTsQBGEHgiDsQBCEHQiCsANBEHYgCMIOBGHu3ryNmTVvYyVLTX28fv365LqrV68uux1I2r9/f7J+880359bG4jj6ae4+4pdGau7ZzWy6me0ys4/MbL+ZrcmWTzGzHWb2afa7cd86AVDYaA7jT0j6R3efLel6ST8zs9mS1kra6e6zJO3MngNoUTXD7u6H3P2D7PGXkj6WdImkDklbsj/bImlJg3oEUILv9d14M5shaZ6k30tqc/dDWemwpLacdToldRboEUAJRn023swmSnpF0kPu/ufhNR86yzfiyTd373L3+e4+v1CnAAoZVdjNbLyGgv5Ld9+aLT5iZu1ZvV3S2D29CYwBNYfebOjazy2SBt39oWHLn5T0hbs/YWZrJU1x93+q8Vpn7NBbyjnnnJOsP/roo4Vev6OjI1m/6qqrCr1+I/X29ubWUkNjUvrSXUmq9d9urWm4x6q8obfRfGa/UdLfS/rQzPZky34u6QlJvzazFZIOSrqrhD4BNEjNsLv7f0vKu7PDj8ptB0Cj8HVZIAjCDgRB2IEgCDsQBGEHguASV2CMqfsSVwBjA2EHgiDsQBCEHQiCsANBEHYgCMIOBEHYgSAIOxAEYQeCIOxAEIQdCIKwA0EQdiAIwg4EQdiBIAg7EARhB4Ig7EAQhB0IgrADQRB2IAjCDgRRM+xmNt3MdpnZR2a238zWZMvXmVm/me3JfhY1vl0A9ao5SYSZtUtqd/cPzGySpPclLdHQfOzH3P2pUW+MSSKAhsubJGI087MfknQoe/ylmX0s6ZJy2wPQaN/rM7uZzZA0T9Lvs0WrzGyfmW0ys8k563SaWbeZdRdrFUARo57rzcwmSvovSf/i7lvNrE3SUUkuab2GDvXvq/EaHMYDDZZ3GD+qsJvZeEm/kbTd3f9thPoMSb9x97+q8TqEHWiwuid2NDOT9Jykj4cHPTtxd9pSST1FmwTQOKM5G3+TpN2SPpR0Klv8c0nLJM3V0GH8AUk/zU7mpV6LPTvQYIUO48tC2IHGY352IDjCDgRB2IEgCDsQBGEHgiDsQBCEHQiCsANBEHYgCMIOBEHYgSAIOxAEYQeCIOxAEDVvOFmyo5IODnt+UbasFbVqb63al0Rv9Sqzt8vyCk29nv07Gzfrdvf5lTWQ0Kq9tWpfEr3Vq1m9cRgPBEHYgSCqDntXxdtPadXeWrUvid7q1ZTeKv3MDqB5qt6zA2gSwg4EUUnYzewWM/uDmfWa2doqeshjZgfM7MNsGupK56fL5tAbMLOeYcummNkOM/s0+z3iHHsV9dYS03gnphmv9L2revrzpn9mN7Nxkv4oaaGkPknvSVrm7h81tZEcZnZA0nx3r/wLGGb2N5KOSfqP01Nrmdm/Shp09yey/1FOdvdHW6S3dfqe03g3qLe8acb/QRW+d2VOf16PKvbs10rqdffP3P24pF9J6qigj5bn7m9LGvzW4g5JW7LHWzT0H0vT5fTWEtz9kLt/kD3+UtLpacYrfe8SfTVFFWG/RNKfhj3vU2vN9+6Sfmtm75tZZ9XNjKBt2DRbhyW1VdnMCGpO491M35pmvGXeu3qmPy+KE3TfdZO7Xy3p7yT9LDtcbUk+9BmslcZON0iaqaE5AA9JerrKZrJpxl+R9JC7/3l4rcr3boS+mvK+VRH2fknThz2fli1rCe7en/0ekPSqhj52tJIjp2fQzX4PVNzP/3P3I+5+0t1PSdqoCt+7bJrxVyT90t23Zosrf+9G6qtZ71sVYX9P0iwz+6GZ/UDSTyS9VkEf32Fm52YnTmRm50r6sVpvKurXJN2bPb5X0rYKe/mGVpnGO2+acVX83lU+/bm7N/1H0iINnZH/H0n/XEUPOX1dLmlv9rO/6t4kvaShw7r/1dC5jRWSLpS0U9Knkn4naUoL9fafGprae5+GgtVeUW83aegQfZ+kPdnPoqrfu0RfTXnf+LosEAQn6IAgCDsQBGEHgiDsQBCEHQiCsANBEHYgiP8DGU9/rioHk3AAAAAASUVORK5CYII=\n",
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ]
+     },
+     "metadata": {
+      "needs_background": "light"
+     },
+     "output_type": "display_data"
+    }
+   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.imshow(mnist_sample[:,:], cmap='gray', interpolation='none')"
@@ -219,7 +323,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 22,
   "id": "e556d26d-bd10-4c01-a81b-3c317cc2349f",
   "metadata": {},
   "outputs": [],
@@ -249,10 +353,20 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 23,
   "id": "4fce9610-341e-407e-85c8-f11f19840ae4",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Result: 10\n",
+      "[('add_value', Parameter containing:\n",
+      "tensor(5))]\n"
+     ]
+    }
+   ],
   "source": [
    "y = addc(5)\n",
    "y = addc.forward(5)\n",
@@ -270,10 +384,19 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 24,
   "id": "0bdf7db1-b7bf-40a4-ac48-48e42b0d4d42",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "OrderedDict([('add_value', tensor(5))])\n",
+      "Result: 9\n"
+     ]
+    }
+   ],
   "source": [
    "state_dict = addc.state_dict()\n",
    "print(state_dict)\n",
@@ -292,15 +415,24 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 29,
   "id": "ee84a7a3-cc48-4b0d-8bd7-d3df2303d862",
   "metadata": {
    "tags": []
   },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(9, device='cuda:0')\n"
+     ]
+    }
+   ],
   "source": [
-    "addc.to('cpu')\n",
-    "y = addc(torch.tensor(5, device='cpu'))"
+    "addc.to('cuda')\n",
+    "y = addc(torch.tensor(5, device='cuda'))\n",
+    "print(y)"
   ]
  },
  {
@@ -313,7 +445,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 33,
   "id": "e9d47d08-adf6-4bb2-ad9a-6db76b4e928e",
   "metadata": {},
   "outputs": [],
@@ -323,6 +455,14 @@
    "result = F.relu(torch.ones(1))\n",
    "result = F.max_pool2d(torch.ones((10,10,10)), kernel_size=2)"
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "36c84fc2-ae6d-46c9-91cd-9b3d1482fb71",
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
@@ -341,7 +481,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.10"
+   "version": "3.6.9"
  }
 },
 "nbformat": 4,

 %% Cell type:markdown id:5414047f-c2e5-4b51-a4bf-7b6b9c562a19 tags:

 # Embedded ML Lab - Excercise 0 - Intro Pytorch

 * Documentation Pytorch: https://pytorch.org/docs/stable/index.html
 * Documentation Matplotlib: https://matplotlib.org/stable/contents.html

 ### Tensor basics
 `PyTorch` uses _pytorch_ _tensors_ to store N-dimensional data similar to NumPy or Matlab. Torch tensors support a variety of matrix or vector operations.

 %% Cell type:code id:32ca05e2-f9c9-4703-b923-28f02bb337f7 tags:

 ``` python
 import torch
 import torch
 torch.rand(1).to('cuda') #initialize cuda context (might take a while)
 x = torch.tensor([5,3]) #create variable
 y = torch.tensor([3,3])

 z = x * y #point-wise multiplication of two variables
 print(z)
 ```

+%% Output
+
+    tensor([15,  9])
+
 %% Cell type:markdown id:91e9f788-ce37-4a53-a1e1-aa8fd34db305 tags:

 Also, there are several methods to initialize tensors like `torch.ones / torch.zeros / torch.randn`
 We can get the shape of a tensor by calling `size` on a tensor

 %% Cell type:code id:1b85c810-8ee9-4fbc-b520-47ba10a65a6f tags:

 ``` python
 ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]
 rand = torch.randn((4,4)) # creates an 2-dimensional random tensor with size [4,4]

 print(ones.size()) # returns a python list with dimension
 ```

+%% Output
+
+    torch.Size([10, 10, 5])
+
 %% Cell type:markdown id:f828aaec-cb84-4b16-a527-3902bc9f8a15 tags:

 Pytorch tensors can also have different datatypes

 %% Cell type:code id:d291bc8a-3c07-4d84-b226-e890f923290a tags:

 ``` python
 torch.ones((10,10), dtype=torch.int) #inits a tensor with ones as int
 torch.ones((10,10), dtype=torch.float) #inits a tensor with ones as float (standard)
 ```

+%% Output
+
+    tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
+            [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
+
 %% Cell type:markdown id:b7c838ef-d685-4021-b9bb-6fd50b6352b0 tags:

 Similar to NumPy or Matlab we can also slice tensors with indices (NumPy Indexing: https://numpy.org/doc/stable/reference/arrays.indexing.html)
 Slicing is equivalent to a torch.view. As the name suggests, this does not change the underlying storage or create a copy, meaning if we change the data, all associated views also show the changes.

 %% Cell type:code id:abff0483-bd54-4218-9c6d-cd2cc4660ee5 tags:

 ``` python
 ones = torch.ones((10,10,5)) # creates a 3-dimensional tensor with ones with size [10,10,5]
 a = ones[0:5,0,0] # we create a view by slicing out index 0,1,2,3,4 from the first dimension and use : to slice all indices for dimension 2 and 3
 print(f"Size of a: {a.size()}")

 ones[0:5,:,:] = 3.14
 print(a)
 b = ones.clone()[0:5,0,0] #cloning a tensor creates an independent copy
 ones[0:5,:,:] = 7.11
 print(b)
 print(a)
 ```

+%% Output
+
+    Size of a: torch.Size([5])
+    tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])
+    tensor([3.1400, 3.1400, 3.1400, 3.1400, 3.1400])
+    tensor([7.1100, 7.1100, 7.1100, 7.1100, 7.1100])
+
 %% Cell type:markdown id:41fdf1fa-61a6-47b4-b613-344f6249c42d tags:

 Other usefull tensor operations are `flatten()`, `sum()`, `max()`, `min()`.

 %% Cell type:code id:13c264c0-3ce1-418b-b626-643af4bcd679 tags:

 ``` python
 a = torch.ones((10,10,10))
 a_flattened = a.flatten()
 print(f"Shape of a: {a.size()}, Shape of a_flattened: {a_flattened.size()}")
 sum_of_a = a.sum(dim=(0,1)) # sum of dimens 0 and 1
 print(f"Sum: {sum_of_a}")
 sum_of_a = a.sum(dim=(0,1,2)) #sum_of_all_entries
 print(f"Sum: {sum_of_a}")
 ```

+%% Output
+
+    Shape of a: torch.Size([10, 10, 10]), Shape of a_flattened: torch.Size([1000])
+    Sum: tensor([100., 100., 100., 100., 100., 100., 100., 100., 100., 100.])
+    Sum: 1000.0
+
 %% Cell type:markdown id:6b3d3890-8d52-4c95-a500-2780fae86f40 tags:

 A very special property of pytorch tensors is that they can be pushed to a device (a GPU) and operations can be done on a GPU. This can speedup operations dramatically, if the required operations are parallelizable.
 We therefore first check if pytorch can reach the Jetsons' GPU.

 %% Cell type:code id:11d2bb31-e22d-4cf1-99f2-98bf17f0616e tags:

 ``` python
 import time
 print(f'CUDA available:        {["no", "yes"][torch.cuda.is_available()]}')

 a = torch.zeros((10**4, 10**4))
 b = torch.zeros((10**4, 10**4))

 def f(device, n, k):
    x = torch.randn(n, n, dtype=torch.float32, device=device)
    for _ in range(k):
        x = torch.matmul(x, x)
        x = (x - x.mean()) / x.std()
    return x.max()

 n = 256
 k = 100

 %timeit -n 1 -r 1 print(f('cpu',  n, k))
 %timeit -n 1 -r 1 print(f('cuda', n, k))
 %timeit -n 1 -r 1 print(f('cpu',  4*n, k))
 %timeit -n 1 -r 1 print(f('cuda', 4*n, k))
 ```

+%% Output
+
+    CUDA available:        yes
+    tensor(11.3122)
+    6.9 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+    tensor(10.7327, device='cuda:0')
+    11.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+    tensor(10.8747)
+    20.3 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+    tensor(8.0055, device='cuda:0')
+    2.69 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
+
 %% Cell type:markdown id:4092e33d-77d9-4203-8546-3af85181e4e9 tags:

 PyTorch tensors (data/nn-weights) can also be stored and loaded from disk.
 We load a sample from the MNIST dataset, which is stored as "mnist_sample.pt" on the disk.
 The MNIST Dataset consists of images of handwritten grayscale images with digits from `0-9`
 * This can be done by using `torch.load("filename")`. Similarly, we can store tensors`toch.store(tensor, "filename")`.

 %% Cell type:code id:4df9bb5b-40c2-49d5-aaf6-4a3aa046d4ca tags:

 ``` python
 mnist_sample = torch.load("mnist_sample.pt") #this loads a 28 by 28 pixel image from the MNSIT dataset
 print(mnist_sample.size())
 ```

+%% Output
+
+    torch.Size([28, 28])
+
 %% Cell type:code id:43b7dd30-19b3-492f-bdc0-20d93cfa17b6 tags:

 ``` python
 import matplotlib.pyplot as plt
 plt.imshow(mnist_sample[:,:], cmap='gray', interpolation='none')
 ```

+%% Output
+
+    <matplotlib.image.AxesImage at 0x7ea8f0ef60>
+
+
+
 %% Cell type:markdown id:805918da-ca53-4b07-8eb8-fdcd9b100f24 tags:

 ### Pytorch Modules

 PyTorch modules are the base classes of neural netorks in PyTorch. All modules we define should inherit from `torch.nn.Module`. Modules can also contain other Modules, allowing nesting.
 A tensor can be defined as a `Parameter` of a module.
 Every module has a forward path defined. We add the paramter to our input and return the sum.

 %% Cell type:code id:e556d26d-bd10-4c01-a81b-3c317cc2349f tags:

 ``` python
 import torch.nn as nn

 class AddConstant(nn.Module):
    def __init__(self):
        super(AddConstant, self).__init__()
        self.add_value = nn.parameter.Parameter(torch.tensor(5), requires_grad=False)

    def forward(self, x):
        y = x + self.add_value
        return y

 addc = AddConstant() #we create a new addValue istance
 ```

 %% Cell type:markdown id:0339d4c7-12d9-4cc9-ad46-844be03ddbcf tags:

 Our AddValue module has several inherited functionality
 * The forward pass can be called by either using the call function `addv(5)` or by directly calling the forward function `addv.forward(5)`.

 %% Cell type:code id:4fce9610-341e-407e-85c8-f11f19840ae4 tags:

 ``` python
 y = addc(5)
 y = addc.forward(5)
 print(f"Result: {y}")
 print(list(addc.named_parameters()))
 ```

+%% Output
+
+    Result: 10
+    [('add_value', Parameter containing:
+    tensor(5))]
+
 %% Cell type:markdown id:3e7c17ea-f386-4d07-bbf7-63c550e2aae9 tags:

 We can load and set so-called 'state_dicts' from modules, containing all parameters (a.k.a NN weights).

 %% Cell type:code id:0bdf7db1-b7bf-40a4-ac48-48e42b0d4d42 tags:

 ``` python
 state_dict = addc.state_dict()
 print(state_dict)
 state_dict['add_value'] = torch.tensor(4)
 addc.load_state_dict(state_dict)
 print(f"Result: {addc.forward(5)}")
 ```

+%% Output
+
+    OrderedDict([('add_value', tensor(5))])
+    Result: 9
+
 %% Cell type:markdown id:5d73ac87-508b-44cb-b8de-100a9a3c5b79 tags:

 Modules can also be pushed to the GPU for calculation.

 %% Cell type:code id:ee84a7a3-cc48-4b0d-8bd7-d3df2303d862 tags:

 ``` python
-addc.to('cpu')
-y = addc(torch.tensor(5, device='cpu'))
+addc.to('cuda')
+y = addc(torch.tensor(5, device='cuda'))
+print(y)
 ```

+%% Output
+
+    tensor(9, device='cuda:0')
+
 %% Cell type:markdown id:aed02c67-2c53-4953-987d-bd83da9586ec tags:

 Functions that do not have parameters can be found in `torch.nn.functional`.

 %% Cell type:code id:e9d47d08-adf6-4bb2-ad9a-6db76b4e928e tags:

 ``` python
 import torch.nn.functional as F

 result = F.relu(torch.ones(1))
 result = F.max_pool2d(torch.ones((10,10,10)), kernel_size=2)
 ```
+
+%% Cell type:code id:36c84fc2-ae6d-46c9-91cd-9b3d1482fb71 tags:
+
+``` python
+```

--- a/exercises/0-intro/exercise_01.ipynb
+++ b/exercises/0-intro/exercise_01.ipynb
@@ -33,7 +33,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 40,
   "id": "guided-recognition",
   "metadata": {},
   "outputs": [],
@@ -46,7 +46,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 88,
   "id": "34cea594-90eb-4a07-b390-b3f332e7869c",
   "metadata": {},
   "outputs": [],
@@ -56,17 +56,32 @@
    "        super(LeNet, self).__init__()\n",
    "        \n",
    "        #---to-be-done-by-student---\n",
-    "        self.conv1 = ###\n",
-    "        self.conv2 = ###\n",
-    "        self.fc1 = ###\n",
-    "        self.fc2 = ###\n",
-    "        self.fc3 = ###\n",
+    "        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)\n",
+    "        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)\n",
+    "        self.fc1 = nn.Linear(in_features=400, out_features=120)\n",
+    "        self.fc2 = nn.Linear(in_features=120, out_features=84)\n",
+    "        self.fc3 = nn.Linear(in_features=84, out_features=10)\n",
    "        \n",
    "        #---end---------------------\n",
    "        return\n",
    "    \n",
    "    def forward(self,x):\n",
    "        #---to-be-done-by-student---\n",
+    "        x = self.conv1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = self.conv2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = x.flatten(1,3)\n",
+    "        \n",
+    "        x = self.fc1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc3(x)\n",
    "        \n",
    "        #---end---------------------\n",
    "        return x"
@@ -84,7 +99,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 89,
   "id": "13b45f83-8084-4cd9-8aa1-09ec8b83445f",
   "metadata": {},
   "outputs": [],
@@ -109,15 +124,29 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 90,
   "id": "a0244c61-aea6-4425-92e2-6aa0972423a7",
   "metadata": {
    "tags": []
   },
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor(6)"
+      ]
+     },
+     "execution_count": 90,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "#---to-be-done-by-student---\n",
-    "\n",
+    "net.load_state_dict(torch.load(\"lenet.pt\"))\n",
+    "tensor = torch.load(\"mnist_sample.pt\")[None, None, :, :]\n",
+    "output = net(tensor)\n",
+    "torch.argmax(output)\n",
    "#---end---------------------"
   ]
  },
@@ -141,13 +170,26 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 102,
   "id": "d1db2c7e-24c9-4511-9557-cccbb208a495",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Number of test images: 10000\n",
+      "Number of batches: 157\n",
+      "Batch shape: torch.Size([64, 1, 28, 28])\n",
+      "Target (Labels): tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1])\n",
+      "torch.Size([64])\n"
+     ]
+    }
+   ],
   "source": [
    "import torchvision\n",
    "import time\n",
+    "from time import perf_counter\n",
    "\n",
    "test_data = torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([\n",
    "                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(\n",
@@ -158,15 +200,25 @@
    "print(f\"Number of batches: {len(test_loader)}\")\n",
    "_, (inputs, targets) = next(enumerate(test_loader))\n",
    "print(f\"Batch shape: {inputs.size()}\")\n",
-    "print(f\"Target (Labels): {targets[0:15]}\")"
+    "print(f\"Target (Labels): {targets[0:15]}\")\n",
+    "print(targets.size())"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 105,
   "id": "5e157640-fa0e-4529-9928-1b6b70c42c21",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "LenNet Accuracy is: 97.43%\n",
+      "Total time for forward pass: 6.024s\n"
+     ]
+    }
+   ],
   "source": [
    "device = torch.device('cuda')\n",
    "correct_detected = 0\n",
@@ -175,18 +227,37 @@
    "\n",
    "net.to(device)\n",
    "net.eval()\n",
+    "\n",
+    "start_time = perf_counter()\n",
+    "\n",
    "for batch_idx, (inputs, targets) in enumerate(test_loader):\n",
    "    \n",
-    "    #---to-be-done-by-student---\n",
-    "    \n",
+    "    inputs = inputs.to(device)\n",
+    "    targets = targets.to(device)\n",
    "    \n",
+    "    #---to-be-done-by-student---\n",
+    "    outputs = net(inputs)\n",
+    "    indexes = outputs.argmax(dim=1)\n",
+    "    num_correct = (indexes == targets).float().sum()\n",
+    "    correct_detected += num_correct\n",
    "    #---end---------------------      \n",
    "\n",
+    "end_time = perf_counter()\n",
+    "    \n",
    "accuracy = correct_detected/len(test_data)\n",
+    "total_time = end_time - start_time\n",
    "\n",
    "print(f'LenNet Accuracy is: {accuracy:.2%}')\n",
    "print(f'Total time for forward pass: {round(total_time, 4)}s')"
   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "01cab8ec-2aa7-405d-af02-e06158734cb4",
+   "metadata": {},
+   "outputs": [],
+   "source": []
  }
 ],
 "metadata": {
@@ -205,7 +276,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.10"
+   "version": "3.6.9"
  }
 },
 "nbformat": 4,

 %% Cell type:markdown id:d8909361-4d07-4331-a587-be85e32a3823 tags:

 # Embedded ML Lab - Excercise 0 - Intro Inference

 %% Cell type:markdown id:2265b134-4819-4b6a-902e-9562836b055d tags:

 We start with a NN model similar to the LeNet model from 1989 (https://en.wikipedia.org/wiki/LeNet). The LeNet Model is designed to detect handwritten numbers from the MNIST dataset http://yann.lecun.com/exdb/mnist/with size 28x28 and outputs a vector with size 10, where each number in this vector represents the likelihood that the input corresponds to that number. All Conv layers have `stride=1` `padding=0`.

 <img src="src/lenet.png" alt="drawing" width="600"/>

 <span style="color:green">Your Tasks:</span>
 * <span style="color:green">Write the init code for the required modules to define LeNet  (Use the provided image to determine the number of input/ouput filters and kernel sizes)</span>
    * <span style="color:green">Determine the output size of conv2 to determine the input size of fc1</span>
    * The size of the output conv2d layer can be determined with the following formula $H_{\text{out}} = \lfloor{ \frac{H_{\text{in}} + 2 \times \text{padding} - 1 \times ( \text{kernelsize} -1 ) -1 } {\text{stride}} +1}\rfloor$
    * Here, maxpool2d with kernel size 2 reduces the input size by factor two: $H_{\text{out}} = \lfloor \frac{H_{\text{in}}}{2}\rfloor$
    * <span style="color:green">Use following modules: `nn.Conv2d, nn.Linear`</span>
 * <span style="color:green">Define the forward pass of LeNet, check the provided image for the flow of data through the modules and functions</span>
    * <span style="color:green">Use the following functions: `F.relu, F.max_pool2d, tensor.flatten`</span>

 %% Cell type:code id:guided-recognition tags:

 ``` python
 import torch
 torch.rand(1).to('cuda') #initialize cuda context (might take a while)
 import torch.nn as nn
 import torch.nn.functional as F
 ```

 %% Cell type:code id:34cea594-90eb-4a07-b390-b3f332e7869c tags:

 ``` python
 class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()

        #---to-be-done-by-student---
-        self.conv1 = ###
-        self.conv2 = ###
-        self.fc1 = ###
-        self.fc2 = ###
-        self.fc3 = ###
+        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)
+        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)
+        self.fc1 = nn.Linear(in_features=400, out_features=120)
+        self.fc2 = nn.Linear(in_features=120, out_features=84)
+        self.fc3 = nn.Linear(in_features=84, out_features=10)

        #---end---------------------
        return

    def forward(self,x):
        #---to-be-done-by-student---
+        x = self.conv1(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = self.conv2(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = x.flatten(1,3)
+
+        x = self.fc1(x)
+        x = F.relu(x)
+        x = self.fc2(x)
+        x = F.relu(x)
+        x = self.fc3(x)

        #---end---------------------
        return x
 ```

 %% Cell type:markdown id:759ee961-4e90-498c-a316-398ec85057f4 tags:

 We can now create a new model instance

 %% Cell type:code id:13b45f83-8084-4cd9-8aa1-09ec8b83445f tags:

 ``` python
 net = LeNet()
 ```

 %% Cell type:markdown id:4535a390-bb2f-4695-a4e2-f7c6253ac0e0 tags:

 We now load the state dict with the filename `lenet.pt` into the model. These weights are already pretrained and should have a high accuracy when detecting MNIST images. Afterwards, we check if the network is able to detect our stored sample.

 <span style="color:green">Your Task:</span>
 * <span style="color:green">Load the state_dict `lenet.pt` from disk and load the state dict into the LeNet instance</span>
 * <span style="color:green">Calculate the output of the network when feeding in the image</span>
    * Load the image from disk (`mnist_sample.pt`) into a tensor
    * Note that you need to expand the dimensions of the tensor, since the network expects an input with size $N \times 1 \times 28 \times 28$ but the image is size $ 28 \times 28$. You can create two dimensions by using a slice with **[None, None, :, :]**
    * Check if the image is detected correctly. The output with the highest value corresponds to the estimated class (you can use `torch.argmax`)

 %% Cell type:code id:a0244c61-aea6-4425-92e2-6aa0972423a7 tags:

 ``` python
 #---to-be-done-by-student---
-
+net.load_state_dict(torch.load("lenet.pt"))
+tensor = torch.load("mnist_sample.pt")[None, None, :, :]
+output = net(tensor)
+torch.argmax(output)
 #---end---------------------
 ```

+%% Output
+
+    tensor(6)
+
 %% Cell type:markdown id:b84fa4c3-0e1a-4264-8fd2-f550ebed730b tags:

 Next, we want to determine the accuracy of the network using the full MNIST test data. Additionally, we want to measure the execution time for the network on the CPU as well as on the GPU.

 * We first load the complete MNIST testset (10.000 Images), and zero-center and scale it.
 * We create a DataLoader, which can be iterated with enumerate and returns the data in chunks of 64, so-called batches. The resulting tensor is of size $64 \times 1 \times 28 \times 28$.
 * The target tensor is of size $64$ where for each image the tensor entry is the correct label number (e.g. image shows a `inputs[8, :, :, :]` shows a two, the corresponding value in the target tensor `targets[8]` is 2.

 <span style="color:green">Your Task:</span>
 * <span style="color:green">For every batch load the data into the network.</span>
 * <span style="color:green">Calculate the overall accuracy (ratio of correctly deteced images to all images).</span>
 * <span style="color:green">Calculate the overall execution time (forward pass) of the network on the cpu as well as on the gpu.</span>
    * <span style="color:green">For GPU calculations you have to load the network as well as the input to the GPU and bring the result back to the CPU for your accuracy calculations.</span>

 %% Cell type:code id:d1db2c7e-24c9-4511-9557-cccbb208a495 tags:

 ``` python
 import torchvision
 import time
+from time import perf_counter

 test_data = torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([
                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(
                                                (0.1307, ), (0.3081)) ]))

 test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)
 print(f"Number of test images: {len(test_data)}")
 print(f"Number of batches: {len(test_loader)}")
 _, (inputs, targets) = next(enumerate(test_loader))
 print(f"Batch shape: {inputs.size()}")
 print(f"Target (Labels): {targets[0:15]}")
+print(targets.size())
 ```

+%% Output
+
+    Number of test images: 10000
+    Number of batches: 157
+    Batch shape: torch.Size([64, 1, 28, 28])
+    Target (Labels): tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1])
+    torch.Size([64])
+
 %% Cell type:code id:5e157640-fa0e-4529-9928-1b6b70c42c21 tags:

 ``` python
 device = torch.device('cuda')
 correct_detected = 0
 accuracy = 0
 total_time = 0.0

 net.to(device)
 net.eval()
-for batch_idx, (inputs, targets) in enumerate(test_loader):

-    #---to-be-done-by-student---
+start_time = perf_counter()
+
+for batch_idx, (inputs, targets) in enumerate(test_loader):

+    inputs = inputs.to(device)
+    targets = targets.to(device)

+    #---to-be-done-by-student---
+    outputs = net(inputs)
+    indexes = outputs.argmax(dim=1)
+    num_correct = (indexes == targets).float().sum()
+    correct_detected += num_correct
    #---end---------------------

+end_time = perf_counter()
+
 accuracy = correct_detected/len(test_data)
+total_time = end_time - start_time

 print(f'LenNet Accuracy is: {accuracy:.2%}')
 print(f'Total time for forward pass: {round(total_time, 4)}s')
 ```
+
+%% Output
+
+    LenNet Accuracy is: 97.43%
+    Total time for forward pass: 6.024s
+
+%% Cell type:code id:01cab8ec-2aa7-405d-af02-e06158734cb4 tags:
+
+``` python
+```

--- a/exercises/0-intro/exercise_02.ipynb
+++ b/exercises/0-intro/exercise_02.ipynb
@@ -29,7 +29,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
   "id": "35f3654e-6a85-42db-84f3-cb7dca8f663f",
   "metadata": {},
   "outputs": [],
@@ -39,14 +39,48 @@
    "import torch.nn.functional as F\n",
    "\n",
    "#---to-be-done-by-student---\n",
-    "\n",
+    "class LeNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(LeNet, self).__init__()\n",
+    "        \n",
+    "        #---to-be-done-by-student---\n",
+    "        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)\n",
+    "        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)\n",
+    "        self.fc1 = nn.Linear(in_features=400, out_features=120)\n",
+    "        self.fc2 = nn.Linear(in_features=120, out_features=84)\n",
+    "        self.fc3 = nn.Linear(in_features=84, out_features=10)\n",
+    "        \n",
+    "        #---end---------------------\n",
+    "        return\n",
+    "    \n",
+    "    def forward(self,x):\n",
+    "        #---to-be-done-by-student---\n",
+    "        x = self.conv1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = self.conv2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = F.max_pool2d(x, 2)\n",
+    "        \n",
+    "        x = x.flatten(1,3)\n",
+    "        \n",
+    "        x = self.fc1(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc2(x)\n",
+    "        x = F.relu(x)\n",
+    "        x = self.fc3(x)\n",
+    "        \n",
+    "        #---end---------------------\n",
+    "        return x\n",
+    "    \n",
    "#---end---------------------\n",
    "net = LeNet()"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
   "id": "05e97a9a-6357-4f9e-9c00-34034ebe6515",
   "metadata": {},
   "outputs": [],
@@ -65,7 +99,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
   "id": "novel-singles",
   "metadata": {},
   "outputs": [],
@@ -73,20 +107,37 @@
    "def correct_predictions(outputs, targets):\n",
    "    correct_predictions = 0\n",
    "    #---to-be-done-by-student---\n",
-    "\n",
+    "    indexes = outputs.argmax(dim=1)\n",
+    "    num_correct = (indexes == targets).float().sum()\n",
    "    #---end---------------------\n",
    "    return correct_predictions"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
   "id": "rental-resident",
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_net(net, device):\n",
    "    #---to-be-done-by-student---\n",
+    "    correct_detected = 0\n",
+    "    overall = len(test_data)\n",
+    "    \n",
+    "    net.to(device)\n",
+    "    net.eval()\n",
+    "\n",
+    "    for batch_idx, (inputs, targets) in enumerate(test_loader):\n",
+    "\n",
+    "        inputs = inputs.to(device)\n",
+    "        targets = targets.to(device)\n",
+    "\n",
+    "        #---to-be-done-by-student---\n",
+    "        outputs = net(inputs)\n",
+    "        \n",
+    "        correct_detected += correct_predictions(outputs, targets)\n",
+    "        #---end---------------------      \n",
    "\n",
    "    #---end---------------------\n",
    "    return float(correct_detected/overall)"
@@ -124,36 +175,103 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 6,
   "id": "15aea393-c5f3-4f84-b952-3334fdc566f9",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(2.3265, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2949, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2863, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2697, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.2264, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(2.0333, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(1.2253, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.7319, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.5572, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.5634, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.4135, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.3941, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.3412, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.3358, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2804, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2822, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2777, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2267, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2748, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2293, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.2099, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1875, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1931, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1664, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1687, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1736, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1413, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1757, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1561, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1376, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1224, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1336, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1185, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1204, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1294, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1055, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1324, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1177, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1058, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0917, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1052, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0943, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0970, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.1075, device='cuda:0', grad_fn=<DivBackward0>)\n",
+      "tensor(0.0874, device='cuda:0', grad_fn=<DivBackward0>)\n"
+     ]
+    }
+   ],
   "source": [
    "n_epochs = 5\n",
    "loss_list = []\n",
    "acc_list = []\n",
    "\n",
    "net = LeNet()\n",
-    "device = torch.device('cpu')\n",
+    "device = torch.device('cuda')\n",
    "\n",
    "optimizer = torch.optim.SGD(net.parameters(), lr=0.01)\n",
    "#optimizer = torch.optim.Adam(net.parameters(), lr=0.1)\n",
    "loss_function = nn.CrossEntropyLoss()\n",
    "\n",
    "#---to-be-done-by-student---\n",
-    "#you can define stuff here ...\n",
+    "net = net.to(device)\n",
+    "total_loss = 0\n",
    "#---end---------------------\n",
    "\n",
    "for epoch_n in range(n_epochs):\n",
    "    for batch_idx, (inputs, targets) in enumerate(train_loader):\n",
    "        \n",
    "        #---to-be-done-by-student---\n",
+    "        optimizer.zero_grad()\n",
+    "        \n",
+    "        inputs = inputs.to(device)\n",
+    "        targets = targets.to(device)\n",
    "\n",
+    "        outputs = net(inputs)\n",
+    "        \n",
+    "        loss = loss_function(outputs, targets).float()\n",
+    "        total_loss += loss\n",
+    "        \n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
    "        #---end---------------------\n",
    "        if batch_idx % 100 == 0 and batch_idx != 0:\n",
    "            \n",
    "            #---to-be-done-by-student---\n",
-    "            \n",
+    "            average_loss = total_loss / 100\n",
+    "            loss_list.append(average_loss)\n",
+    "            print(average_loss)\n",
+    "            total_loss = 0\n",
    "            #---end---------------------"
   ]
  },
@@ -224,7 +342,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.10"
+   "version": "3.6.9"
  }
 },
 "nbformat": 4,

 %% Cell type:markdown id:5541fea7-a455-4282-a096-b48a594ec530 tags:

 # Embedded ML Lab - Excercise 0 - Intro Training

 Now that we have covered Neural Network Inference, we come to the training of neural networks. We reuse the LeNet from the previous exercise.

 <span style="color:green">Your Task:</span>
 * <span style="color:green">Copy your implementation of the LeNet from the last excercise into Cell1</span>

 In Cell 2 the dataset is already prepared as a dataloader (using batch_size 32). Additionally, the images are already zero-centered and normalized. We have two separate data_loaders: `test_loader` for testing the accuracy of the model, and `train_data` for training the model. These two should not be mixed for their tasks. You can iterate over the batches of a dataloader by using `for idx, (input, targets) in enumerate(dataloader):`

 Before we start with training, we need to write two functions. The first is `correct_predictions(outputs, targets)`, where you can reuse code from exercise_00. This function takes the outputs and targets as input and returns an int with the number of correct predictions in the batch.
 The second function `test_net(net, device)`. This function iterates over the testloader, applies the network's forward pass, and returns the overall accuracy of the model (all correct predictions of the testset overall testset predictions)

 <span style="color:green">Your Tasks:</span>
 * <span style="color:green">Implement the `correct_predictions` function (Cell 3)</span>
 * <span style="color:green">Implement the `test_net` function (Cell 4)</span>
    * <span style="color:green">First set the network in evaluation mode with `.eval()` </span>
    * <span style="color:green">Iterate over the batches in the dataloader</span>
    * <span style="color:green">For each batch calculate the correct detected images</span>
    * NOTE: you can also only iterate over a fraction of batches to save some time
    * <span style="color:green">Return the overall Accuracy</span>

 %% Cell type:code id:35f3654e-6a85-42db-84f3-cb7dca8f663f tags:

 ``` python
 import torch
 import torch.nn as nn
 import torch.nn.functional as F

 #---to-be-done-by-student---
+class LeNet(nn.Module):
+    def __init__(self):
+        super(LeNet, self).__init__()
+
+        #---to-be-done-by-student---
+        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)
+        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3)
+        self.fc1 = nn.Linear(in_features=400, out_features=120)
+        self.fc2 = nn.Linear(in_features=120, out_features=84)
+        self.fc3 = nn.Linear(in_features=84, out_features=10)
+
+        #---end---------------------
+        return
+
+    def forward(self,x):
+        #---to-be-done-by-student---
+        x = self.conv1(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = self.conv2(x)
+        x = F.relu(x)
+        x = F.max_pool2d(x, 2)
+
+        x = x.flatten(1,3)
+
+        x = self.fc1(x)
+        x = F.relu(x)
+        x = self.fc2(x)
+        x = F.relu(x)
+        x = self.fc3(x)
+
+        #---end---------------------
+        return x

 #---end---------------------
 net = LeNet()
 ```

 %% Cell type:code id:05e97a9a-6357-4f9e-9c00-34034ebe6515 tags:

 ``` python
 import torchvision
 import time

 test_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([
                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(
                                                (0.1307, ), (0.3081)) ])), batch_size=64, shuffle=False, drop_last=True)

 train_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST('.', train=True, download=True, transform=torchvision.transforms.Compose([
                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(
                                                (0.1307, ), (0.3081)) ])), batch_size=64, shuffle=False, drop_last=True)
 ```

 %% Cell type:code id:novel-singles tags:

 ``` python
 def correct_predictions(outputs, targets):
    correct_predictions = 0
    #---to-be-done-by-student---
-
+    indexes = outputs.argmax(dim=1)
+    num_correct = (indexes == targets).float().sum()
    #---end---------------------
    return correct_predictions
 ```

 %% Cell type:code id:rental-resident tags:

 ``` python
 def test_net(net, device):
    #---to-be-done-by-student---
+    correct_detected = 0
+    overall = len(test_data)
+
+    net.to(device)
+    net.eval()
+
+    for batch_idx, (inputs, targets) in enumerate(test_loader):
+
+        inputs = inputs.to(device)
+        targets = targets.to(device)
+
+        #---to-be-done-by-student---
+        outputs = net(inputs)
+
+        correct_detected += correct_predictions(outputs, targets)
+        #---end---------------------

    #---end---------------------
    return float(correct_detected/overall)
 ```

 %% Cell type:markdown id:industrial-scout tags:

 Now that we have these two helper functions we come to training the network. Some parts are already given. You can do the training either on the cpu or on the gpu (gpu should be faster).

 First we define an optimizer `optimizer = torch.optim.SGD(net.parameters(), lr=0.01)` and hand in the model's parameters (e.g., weights and biases of the conv and linear layers). Besides the model's parameters, we set the learning rate to 0.01. The learning defines the step size for updating the parameters based on their gradients.

 Also, we require a loss function `loss_function = nn.CrossEntropyLoss()`, which defines the error (loss) between the output of the network and the desired output target.
 To train the network we iterate over the dataset several times (for 5 epochs).

 We can split the training into five parts (for each training batch):
 <span style="color:green">Your Tasks:</span>
 * <span style="color:green">**Clean old gradients**: Remove the previous gradients of the parameters by calling `optimizer.zero_grad()`.</span>
 * <span style="color:green">**Forward Pass**: Similar to the previous inference experiments,
    calculate the network's output.</span>
 * <span style="color:green">**Loss**: Calculate the loss by using `loss_function(outputs, targets)`.</span>
 * <span style="color:green">**Backpropagation of the error**: Call `.backward()` on the loss tensor and Pytorch will automatically calculate the respective gradients of the modules with respect to the input and parameters.</span>
 * <span style="color:green">**Step**: As the last step, modify the parameters based on their gradients by calling `optimizer.step()`.</span>

 Plotting the accuracy and loss of the model:

 <span style="color:green">Your Tasks:</span>
 * <span style="color:green">Collect the network's loss for each batch.
 * <span style="color:green">After every 100 batches calculate the networks average_loss (of last 100 batches).</span>
 * <span style="color:green">Similar, calculate the models accuracy using using your defined `test_net` function.</span>
 * <span style="color:green">Append the average loss and the accuracy to `loss_list` and `acc_list`, respectively.</span>

 %% Cell type:code id:15aea393-c5f3-4f84-b952-3334fdc566f9 tags:

 ``` python
 n_epochs = 5
 loss_list = []
 acc_list = []

 net = LeNet()
-device = torch.device('cpu')
+device = torch.device('cuda')

 optimizer = torch.optim.SGD(net.parameters(), lr=0.01)
 #optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
 loss_function = nn.CrossEntropyLoss()

 #---to-be-done-by-student---
-#you can define stuff here ...
+net = net.to(device)
+total_loss = 0
 #---end---------------------

 for epoch_n in range(n_epochs):
    for batch_idx, (inputs, targets) in enumerate(train_loader):

        #---to-be-done-by-student---
+        optimizer.zero_grad()
+
+        inputs = inputs.to(device)
+        targets = targets.to(device)

+        outputs = net(inputs)
+
+        loss = loss_function(outputs, targets).float()
+        total_loss += loss
+
+        loss.backward()
+        optimizer.step()
        #---end---------------------
        if batch_idx % 100 == 0 and batch_idx != 0:

            #---to-be-done-by-student---
-
+            average_loss = total_loss / 100
+            loss_list.append(average_loss)
+            print(average_loss)
+            total_loss = 0
            #---end---------------------
 ```

+%% Output
+
+    tensor(2.3265, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2949, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2863, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2697, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.2264, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(2.0333, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(1.2253, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.7319, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.5572, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.5634, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.4135, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.3941, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.3412, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.3358, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2804, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2822, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2777, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2267, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2748, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2293, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.2099, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1875, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1931, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1664, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1687, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1736, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1413, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1757, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1561, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1376, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1224, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1336, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1185, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1204, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1294, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1055, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1324, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1177, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1058, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0917, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1052, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0943, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0970, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.1075, device='cuda:0', grad_fn=<DivBackward0>)
+    tensor(0.0874, device='cuda:0', grad_fn=<DivBackward0>)
+
 %% Cell type:code id:armed-counter tags:

 ``` python
 import matplotlib.pyplot as plt
 import numpy as np

 plt.rcParams['figure.figsize'] = [10, 5]

 fig, ax = plt.subplots(1)

 ax.plot(np.array(acc_list), color='tab:blue')
 ax.set_xlabel('mini-batch steps (100)')
 ax.set_ylabel('LeNet accuracy')
 ax.tick_params(colors='tab:blue', axis='y')

 ax2 = ax.twinx()
 ax2.plot(np.array(loss_list), color='tab:red')
 ax2.set_ylabel('LeNet loss')
 ax2.tick_params(colors='tab:red', axis='y')
 ax.set_title('LeNet training')
 ```

 %% Cell type:markdown id:dental-lounge tags:

 You can save your training state by using `state_dict = net.state_dict()` and `torch.save(state_dict, 'lenet_new.pt')`

 <span style="color:green">Your Task:</span>
 * <span style="color:green">Save the state dict of the model with a new name and plug it into exercise 01 by changing the file name in Cell 9</span>

 %% Cell type:code id:cosmetic-router tags:

 ``` python
 #save model here
 #---to-be-done-by-student---

 #---end---------------------
 ```