{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ "\"Open   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 3: Meta-learning \n", "\n", "**Week 2, Day 4: Macro-Learning**\n", "\n", "**By Neuromatch Academy**\n", "\n", "__Content creators:__ Hlib Solodzhuk, Ximeng Mao, Grace Lindsay\n", "\n", "__Content reviewers:__ Aakash Agrawal, Alish Dipani, Hossein Rezaei, Yousef Ghanbari, Mostafa Abdollahi, Hlib Solodzhuk, Ximeng Mao, Samuele Bolotta, Grace Lindsay\n", "\n", "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "___\n", "\n", "\n", "# Tutorial Objectives\n", "\n", "*Estimated timing of tutorial: 50 minutes*\n", "\n", "In this tutorial, you will examine how meta-learning separates the problem of continual learning into two stages." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from IPython.display import IFrame\n", "from ipywidgets import widgets\n", "out = widgets.Output()\n", "with out:\n", " print(f\"If you want to download the slides: https://osf.io/download/t36w8/\")\n", " display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/t36w8/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"neuromatch_neuroai\",\n", " \"user_key\": \"wb2cxze8\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W2D4_T3\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Imports\n", "\n", "#working with data\n", "import numpy as np\n", "from functools import partial\n", "\n", "#plotting\n", "import matplotlib.pyplot as plt\n", "import logging\n", "from sklearn.decomposition import PCA\n", "from matplotlib.lines import Line2D\n", "\n", "#interactive display\n", "import ipywidgets as widgets\n", "\n", "#modeling\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "from torch.utils.data import Dataset, DataLoader\n", "from sklearn.metrics import r2_score\n", "\n", "#utils\n", "from tqdm import tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure settings\n", "\n", "logging.getLogger('matplotlib.font_manager').disabled = True\n", "\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'retina' # perfrom high definition rendering for images and plots\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Plotting functions\n", "\n", "def plot_tasks(task_days, task_prices):\n", " \"\"\"\n", " Plot the tasks' prices over time.\n", "\n", " Inputs:\n", " - task_days (list): A list of three lists, where each sub-list contains the days for a specific task.\n", " - task_prices (list): A list of three lists, where each sub-list contains the prices for a specific task.\n", " \"\"\"\n", " sorted_first_task_days, sorted_first_task_prices = zip(*sorted(zip(task_days[0], task_prices[0]), key=lambda pair: pair[0]))\n", " sorted_second_task_days, sorted_second_task_prices = zip(*sorted(zip(task_days[1], task_prices[1]), key=lambda pair: pair[0]))\n", " sorted_third_task_days, sorted_third_task_prices = zip(*sorted(zip(task_days[2], task_prices[2]), key=lambda pair: pair[0]))\n", "\n", " with plt.xkcd():\n", " plt.plot(sorted_first_task_days, sorted_first_task_prices, label = \"First Task\")\n", " plt.plot(sorted_second_task_days, sorted_second_task_prices, label = \"Second Task\")\n", " plt.plot(sorted_third_task_days, sorted_third_task_prices, label = \"Third Task\")\n", " plt.xlabel('Week')\n", " plt.ylabel('Price')\n", " plt.legend()\n", " plt.show()\n", "\n", "def plot_inner_outer_weights(pca_parameters, epoch):\n", " \"\"\"\n", " Plot PCA-transformed outer weights of the model in 2D over the epochs as well as inner / outer weights for the given epoch\n", "\n", " Inputs:\n", " - pca_parameters (np.ndarray): array of model parameters (already in 2D).\n", " - epoch (int): given epoch.\n", " \"\"\"\n", " with plt.xkcd():\n", " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))\n", "\n", " #plot points for the given epoch\n", " for j in range(pca_parameters.shape[1]):\n", " ax1.scatter(pca_parameters[epoch - 1, j, 0], pca_parameters[epoch - 1, j, 1], color='k', s=10)\n", "\n", " start_point = pca_parameters[epoch - 1, 0]\n", "\n", " #plot arrows from start point to all other\n", " for j in range(1, pca_parameters.shape[1]):\n", " #inner\n", " end_point = pca_parameters[epoch - 1, j]\n", " arrow_color = 'g'\n", " #outer\n", " if j == pca_parameters.shape[1] - 1:\n", " arrow_color = 'b'\n", " ax1.annotate('', xy=end_point, xytext=start_point,\n", " arrowprops=dict(arrowstyle='->', color=arrow_color))\n", " #plot arrows for previous outer\n", " for j in range(epoch - 1):\n", " ax1.scatter(pca_parameters[j, 0, 0], pca_parameters[j, 0, 1], color='k', s=10)\n", " start_point = pca_parameters[j, 0]\n", " end_point = pca_parameters[j + 1, 0]\n", " ax1.annotate('', xy=end_point, xytext=start_point,\n", " arrowprops=dict(arrowstyle='->', color='b', alpha = 0.2))\n", "\n", " #plot points for the given epoch\n", " for j in range(pca_parameters.shape[1]):\n", " ax2.scatter(pca_parameters[epoch - 1, j, 0], pca_parameters[epoch - 1, j, 1], color='k', s=10)\n", "\n", " start_point = pca_parameters[epoch - 1, 0]\n", "\n", " #plot arrows from start point to all other\n", " for j in range(1, pca_parameters.shape[1]):\n", " #inner\n", " end_point = pca_parameters[epoch - 1, j]\n", " arrow_color = 'g'\n", " #outer\n", " if j == pca_parameters.shape[1] - 1:\n", " arrow_color = 'b'\n", " ax2.annotate('', xy=end_point, xytext=start_point,\n", " arrowprops=dict(arrowstyle='->', color=arrow_color))\n", "\n", " ax1.set_title(\"Outer weights evolution across epochs\")\n", " ax2.set_title(f\"Inner and outer weights for epoch {epoch}\")\n", " # Create legend handles\n", " inner_arrow = Line2D([0], [0], color='g', lw=2, label='Inner weights')\n", " outer_arrow = Line2D([0], [0], color='b', lw=2, label='Outer weights')\n", "\n", " # Add legend to the second subplot (ax2)\n", " ax2.legend(handles=[inner_arrow, outer_arrow], loc='upper left')\n", "\n", " fig.suptitle(f'Epoch {epoch}', fontsize=16)\n", " plt.show()\n", "\n", "def value_to_saturation(value, vmin, vmax):\n", " \"\"\"\n", " Return saturation of the point based on the min/max values in the array.\n", "\n", " Inputs:\n", " - value (float): value of point.\n", " - vmin (float): min value in all points.\n", " - vmax (float): max value in all points.\n", " \"\"\"\n", " norm_value = (value - vmin) / (vmax - vmin)\n", " saturation = 0.2 + 0.8 * norm_value\n", " return saturation\n", "\n", "def plot_sensitivity_r_squared(name, list_gradient_steps, list_num_samples_finetune, fix_scale = False):\n", " \"\"\"Performs fine-tuning for a couple of tasks for different hyperparameter values and plots 3D sensitivity plot.\n", "\n", " Inputs:\n", " - name (str): name of the model's file.\n", " - gradient_steps (np.ndarray): list of number of steps to perform gradient descent.\n", " - num_samples_finetune (np.ndarray) list of number of samples.\n", " - fix_scale (bool, default = False): whether to fix the same values of R-squared metric for both plots.\n", " \"\"\"\n", " model_path = name + '.pt'\n", " meta_model = MetaLearningModel(model = model_path, mean = days_mean, std = days_std)\n", " dataset = FruitSupplyDataset(also_sample_outer = False)\n", "\n", " tasks = [[0.005, 0.1, 0.0, 1.0], [-0.005, 0.1, 0.0, 4.0]]\n", "\n", " cmap = plt.colormaps.get_cmap('Reds')\n", "\n", " with plt.xkcd():\n", " legend_num_samples_finetune = []\n", " legend_gradient_steps = []\n", " legend_r_squared_score = []\n", " prices = tasks[0][0] * days ** 2 + tasks[0][1] * np.sin(np.pi * days + tasks[0][2]) + tasks[0][3]\n", " for num_samples_finetune in list_num_samples_finetune:\n", " x_finetune, y_finetune = dataset.sample_particular_task(*tasks[0], num_samples_finetune)\n", " for gradient_steps in list_gradient_steps:\n", " if gradient_steps:\n", " prediction = finetune(meta_model, torch.tensor(x_finetune).type(torch.float32), torch.tensor(y_finetune).type(torch.float32), gradient_steps)(torch.tensor((np.expand_dims(days, 1) - days_mean) / days_std).type(torch.float32)).detach().numpy()\n", " legend_num_samples_finetune.append(num_samples_finetune)\n", " legend_gradient_steps.append(gradient_steps)\n", " legend_r_squared_score.append(r2_score(prices, prediction))\n", "\n", "\n", " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6), subplot_kw={'projection': '3d'})\n", "\n", " vmin = np.min(legend_r_squared_score)\n", " vmax = np.max(legend_r_squared_score)\n", " colors = [cmap(value_to_saturation(value, vmin, vmax)) for value in legend_r_squared_score]\n", "\n", " ax1.scatter(legend_num_samples_finetune, legend_gradient_steps, legend_r_squared_score, c=colors, marker='o')\n", " ax1.set_xlabel('Number of samples')\n", " ax1.set_ylabel('Number of gradient steps')\n", " ax1.set_zlabel('R-squared score')\n", " ax1.set_title('Positive Squared Term Task')\n", "\n", " legend_num_samples_finetune = []\n", " legend_gradient_steps = []\n", " legend_r_squared_score = []\n", " prices = tasks[1][0] * days ** 2 + tasks[1][1] * np.sin(np.pi * days + tasks[1][2]) + tasks[1][3]\n", " for num_samples_finetune in list_num_samples_finetune:\n", " x_finetune, y_finetune = dataset.sample_particular_task(*tasks[1], num_samples_finetune)\n", " for gradient_steps in list_gradient_steps:\n", " if gradient_steps:\n", " prediction = finetune(meta_model, torch.tensor(x_finetune).type(torch.float32), torch.tensor(y_finetune).type(torch.float32), gradient_steps)(torch.tensor((np.expand_dims(days, 1) - days_mean) / days_std).type(torch.float32)).detach().numpy()\n", " legend_num_samples_finetune.append(num_samples_finetune)\n", " legend_gradient_steps.append(gradient_steps)\n", " legend_r_squared_score.append(r2_score(prices, prediction))\n", "\n", " vmin = np.min(legend_r_squared_score)\n", " vmax = np.max(legend_r_squared_score)\n", " colors = [cmap(value_to_saturation(value, vmin, vmax)) for value in legend_r_squared_score]\n", "\n", " ax2.scatter(legend_num_samples_finetune, legend_gradient_steps, legend_r_squared_score, c=colors, marker='o')\n", " ax2.set_xlabel('Number of samples')\n", " ax2.set_ylabel('Number of gradient steps')\n", " ax2.set_zlabel('R-squared score')\n", " ax2.set_title('Negative Squared Term Task')\n", "\n", " if fix_scale:\n", " ax1.set_zlim(0.65, 1)\n", " ax2.set_zlim(0.65, 1)\n", "\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Helper functions\n", "\n", "class UtilModel(nn.Module):\n", " def __init__(self, model, mean = 0, std = 1, outer_learning_rate=0.001, inner_learning_rate=0.01):\n", " \"\"\"Super class for model; hide utility code.\n", " \"\"\"\n", " super(UtilModel, self).__init__()\n", "\n", " self.model = self.__load_model_from_context(model)\n", "\n", " self.outer_learning_rate = outer_learning_rate\n", " self.inner_learning_rate = inner_learning_rate\n", "\n", " self.mean = mean\n", " self.std = std\n", "\n", " self.loss_fn = nn.MSELoss()\n", "\n", " def __load_model_from_context(self, model):\n", " \"\"\"Load weights of the model from file or as defined architecture.\n", " \"\"\"\n", " if isinstance(model, str):\n", " return torch.load(model)\n", " return model\n", "\n", " def deep_clone_model(self, model):\n", " \"\"\"Create clone of the model.\n", " \"\"\"\n", " clone = type(model)()\n", " clone.load_state_dict(model.state_dict())\n", " return clone\n", "\n", " def save_parameters(self, path):\n", " \"\"\"Save the parameters as a state dictionary.\n", " \"\"\"\n", " torch.save(self.model, path)\n", "\n", " def inference(self, x):\n", " \"\"\"Implement forward pass for inference.\n", " \"\"\"\n", " #apply normalization on days\n", " x = (x - self.mean) / self.std\n", " return self.model(x)\n", "\n", " def manual_output(self, weights, x):\n", " \"\"\"Calculate the result of forward pass on the external values of the model parameters (weights).\n", " \"\"\"\n", " for j in range(len(weights) // 2):\n", " kernel, bias = weights[2 * j], weights[2 * j + 1]\n", " if j == len(weights) // 2 - 1:\n", " #last layer doesn't possess ReLU activation\n", " return F.linear(x, kernel, bias = bias)\n", " else:\n", " x = F.relu(F.linear(x, kernel, bias = bias))\n", "\n", "days = np.arange(-26, 26 + 1/7, 1/7, dtype = np.float32)\n", "\n", "class FruitSupplyDatasetComplete(Dataset):\n", " def __init__(self, num_epochs = 1, num_tasks = 1, num_samples = 1, days = days, also_sample_outer = True):\n", " \"\"\"Initialize particular instance of `FruitSupplyDataset` dataset.\n", "\n", " Inputs:\n", " - num_epochs (int): Number of epochs the model is going to be trained on.\n", " - num_tasks (int): Number of tasks to sample for each epoch (the loss and improvement is going to be represented as sum over considered tasks).\n", " - num_samples (int): Number of days to sample for each task.\n", " - days (np.ndarray): Summer and autumn days to sample from.\n", " - also_sample_outer (bool): `True` if we want to sample inner and outer data (necessary for training).\n", "\n", " Raises:\n", " - ValueError: If the number of sampled days `num_samples` exceeds number of days to sample from.\n", " \"\"\"\n", "\n", " if also_sample_outer:\n", " if num_samples > days.shape[0] // 2:\n", " raise ValueError(\"Number of sampled days for one task should be less or equal to the total amount of days divided by two as we sample inner and outer data.\")\n", " else:\n", " if num_samples > days.shape[0]:\n", " raise ValueError(\"Number of sampled days for one task should be less or equal to the total amount of days.\")\n", "\n", " #total amount of data is (2/4 x num_epochs x num_tasks x num_samples) (2/4 because -> x_inner, x_outer, y_inner, y_outer; outer is optional)\n", " self.num_epochs = num_epochs\n", " self.num_tasks = num_tasks\n", " self.num_samples = num_samples\n", " self.also_sample_outer = also_sample_outer\n", " self.days = days\n", "\n", " def __len__(self):\n", " \"\"\"Calculate the length of the dataset. It is obligatory for PyTorch to know in advance how many samples to expect (before training),\n", " thus we enforced to icnlude number of epochs and tasks per epoch in `FruitSupplyDataset` parameters.\"\"\"\n", "\n", " return self.num_epochs * self.num_tasks\n", "\n", " def __getitem__(self, idx):\n", " \"\"\"Generate particular instance of task with prefined number of samples `num_samples`.\"\"\"\n", "\n", " A = np.random.uniform(min_A, max_A, size = 1)\n", " B = np.random.uniform(min_B, max_B, size = 1)\n", " phi = np.random.uniform(min_phi, max_phi, size = 1)\n", " C = np.random.uniform(min_C, max_C, size = 1)\n", "\n", " #`replace = False` is important flag here as we don't want repeated data\n", " inner_sampled_days = np.expand_dims(np.random.choice(self.days, size = self.num_samples, replace = False), 1)\n", "\n", " if self.also_sample_outer:\n", "\n", " #we don't want inner and outer data to overlap\n", " outer_sampled_days = np.expand_dims(np.random.choice(np.setdiff1d(self.days, inner_sampled_days), size = self.num_samples, replace = False), 1)\n", "\n", " return inner_sampled_days, A * inner_sampled_days ** 2 + B * np.sin(np.pi * inner_sampled_days + phi) + C, outer_sampled_days, A * outer_sampled_days ** 2 + B * np.sin(np.pi * outer_sampled_days + phi) + C\n", "\n", " return inner_sampled_days, A * inner_sampled_days ** 2 + B * np.sin(np.pi * inner_sampled_days + phi) + C\n", "\n", " def sample_particular_task(self, A, B, phi, C, num_samples):\n", " \"\"\"Samples for the particular instance of the task defined by the tuple of parameters (A, B, phi, C) and `num_samples`.\"\"\"\n", "\n", " sampled_days = np.expand_dims(np.random.choice(self.days, size = num_samples, replace = False), 1)\n", " return sampled_days, A * sampled_days ** 2 + B * np.sin(np.pi * sampled_days + phi) + C\n", "\n", "def finetune_complete(model, x_finetune, y_finetune, finetune_gradient_steps):\n", " \"\"\"\n", " Take a fixed number of gradient steps for the given x_finetune and y_finetune.\n", "\n", " Inputs:\n", " - model (MetaLearningModel): trained meta learning model.\n", " - x_finetune (torch.tensor): features (days) of the specific task.\n", " - y_finetune (torch.tensor): outcomes (prices) of the specific task.\n", " - finetune_gradient_steps (int): number of gradient steps to perform for this task.\n", " \"\"\"\n", " #apply normalization on days\n", " x_finetune = (x_finetune - model.mean) / model.std\n", "\n", " #need to create clone, so that we preserve meta-learnt parameters\n", " clone = model.deep_clone_model(model.model)\n", " optimizer = optim.SGD(clone.parameters(), lr = model.inner_learning_rate)\n", "\n", " for _ in range(finetune_gradient_steps):\n", " optimizer.zero_grad()\n", " loss = model.loss_fn(clone(x_finetune), y_finetune)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " return clone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data retrieval\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Data retrieval\n", "\n", "import os\n", "import requests\n", "import hashlib\n", "\n", "# Variables for file and download URL\n", "fname = \"SummerAutumnModel.pt\" # The name of the file to be downloaded\n", "url = \"https://osf.io/2mc6r/download\" # URL from where the file will be downloaded\n", "expected_md5 = \"9d194af9815d1a65b834d1200189c98c\" # MD5 hash for verifying file integrity\n", "\n", "if not os.path.isfile(fname):\n", " try:\n", " # Attempt to download the file\n", " r = requests.get(url) # Make a GET request to the specified URL\n", " except requests.ConnectionError:\n", " # Handle connection errors during the download\n", " print(\"!!! Failed to download data !!!\")\n", " else:\n", " # No connection errors, proceed to check the response\n", " if r.status_code != requests.codes.ok:\n", " # Check if the HTTP response status code indicates a successful download\n", " print(\"!!! Failed to download data !!!\")\n", " elif hashlib.md5(r.content).hexdigest() != expected_md5:\n", " # Verify the integrity of the downloaded file using MD5 checksum\n", " print(\"!!! Data download appears corrupted !!!\")\n", " else:\n", " # If download is successful and data is not corrupted, save the file\n", " with open(fname, \"wb\") as fid:\n", " fid.write(r.content) # Write the downloaded content to a file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set random seed\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Set random seed\n", "\n", "import random\n", "import numpy as np\n", "import torch\n", "\n", "def set_seed(seed=None, seed_torch=True):\n", " if seed is None:\n", " seed = np.random.choice(2 ** 32)\n", " random.seed(seed)\n", " np.random.seed(seed)\n", " if seed_torch:\n", " torch.manual_seed(seed)\n", " torch.cuda.manual_seed_all(seed)\n", " torch.cuda.manual_seed(seed)\n", " torch.backends.cudnn.benchmark = False\n", " torch.backends.cudnn.deterministic = True\n", "\n", "set_seed(seed = 42)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video 1: Meta-learning\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: Meta-learning\n", "\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "video_ids = [('Youtube', 'QtnE6QIw-8U'), ('Bilibili', 'BV1p7421d73J')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_meta_learning\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "\n", "# Section 1: Introducing meta-learning task\n", "\n", "In this section, we introduce the meta-learning approach. We will discuss its main components and then focus on defining the task before proceeding to training.\n", "\n", "The idea behind meta-learning is that we can \"learn to learn\". Meta-learning is a phrase that is used to describe a lot of different approaches to making more flexible and adaptable systems. The MAML approach is essentially finding good initialization weights, and because those initial weights are learned based on the task set, the model is technically \"learning to find a good place to learn from\". \"Learning to learn\" is a shorter, snappier description that can encompass many different meta-learning approaches." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercise 1: Task space\n", "\n", "We aim to develop a model that, in its desired state, can generalize knowledge about a particular set of similar tasks and adapt to a new task from this set extremely rapidly. Specifically, we aim to identify model weights such that the weights can be robustly and easily fine-tuned to create valid predictions on a specific task in just a few learning steps. This is similar to what humans excel at: performing one-shot tasks without much specific training.\n", "\n", "For this, we first need to define the **task space** -- the set of tasks we want the model to learn. Formally, we consider a distribution over tasks $p(\\tau)$ that we want our model to be able to adapt to. In the $K$-shot learning setting, the model is trained to learn a new task $\\tau_{i}$ drawn from $p(\\tau)$ using only $K$ samples drawn from $\\tau_{i}$.\n", "\n", "Our task space is parametrized by the tuple of parameters $(A, B, \\phi, C)$ describing the relationship between day and price as follows:\n", "\n", "$$f(x) = A x^{2} + B sin(\\pi x + \\phi) + C$$\n", "\n", "Thus, a particular task is a tuple of assigned values. For example, $A = 0.005$, $B = 0.5$, $\\phi = 0$ and $C = 0$.\n", "\n", "You will implement `FruitSupplyDataset`, which enables the generation of a particular instance of the task. We will use it as an extension of `torch.utils.data.Dataset` to load data during training." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "#define variables\n", "days = np.arange(-26, 26 + 1/7, 1/7, dtype = np.float32)\n", "\n", "#we are going to take only summer and autumn days\n", "days = days[151:334]\n", "\n", "#we will use normalization during training\n", "days_mean, days_std = np.mean(days), np.std(days)\n", "\n", "#define boundaries for parameters to sample from\n", "min_A = .0005\n", "max_A = .005\n", "\n", "min_B = 0.05\n", "max_B = 0.5\n", "\n", "min_phi = 0\n", "max_phi = np.pi\n", "\n", "min_C = .5\n", "max_C = 3" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Please complete the missing code lines to sample parameters uniformly from their min-max range, as well as to sample `num_samples` of days for inner and outer data. As we will see later during training, inner data is used to update task-specific weights and outer data is used to calculate base weight updates." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "class FruitSupplyDataset(Dataset):\n", " def __init__(self, num_epochs = 1, num_tasks = 1, num_samples = 1, days = days, also_sample_outer = True):\n", " \"\"\"Initialize particular instance of `FruitSupplyDataset` dataset.\n", "\n", " Inputs:\n", " - num_epochs (int): Number of epochs the model is going to be trained on.\n", " - num_tasks (int): Number of tasks to sample for each epoch (the loss and improvement is going to be represented as sum over considered tasks).\n", " - num_samples (int): Number of days to sample for each task.\n", " - days (np.ndarray): Summer and autumn days to sample from.\n", " - also_sample_outer (bool): `True` if we want to sample inner and outer data (necessary for training).\n", "\n", " Raises:\n", " - ValueError: If the number of sampled days `num_samples` exceeds number of days to sample from.\n", " \"\"\"\n", "\n", " if also_sample_outer:\n", " if num_samples > days.shape[0] // 2:\n", " raise ValueError(\"Number of sampled days for one task should be less or equal to the total amount of days divided by two as we sample inner and outer data.\")\n", " else:\n", " if num_samples > days.shape[0]:\n", " raise ValueError(\"Number of sampled days for one task should be less or equal to the total amount of days.\")\n", "\n", " #total amount of data is (2/4 x num_epochs x num_tasks x num_samples) (2/4 because -> x_inner, x_outer, y_inner, y_outer; outer is optional)\n", " self.num_epochs = num_epochs\n", " self.num_tasks = num_tasks\n", " self.num_samples = num_samples\n", " self.also_sample_outer = also_sample_outer\n", " self.days = days\n", "\n", " def __len__(self):\n", " \"\"\"Calculate the length of the dataset. It is obligatory for PyTorch to know in advance how many samples to expect (before training),\n", " thus we enforced to icnlude number of epochs and tasks per epoch in `FruitSupplyDataset` parameters.\"\"\"\n", "\n", " return self.num_epochs * self.num_tasks\n", "\n", " def __getitem__(self, idx):\n", " \"\"\"Generate particular instance of task with prefined number of samples `num_samples`.\"\"\"\n", "\n", " ###################################################################\n", " ## Fill out the following then remove\n", " raise NotImplementedError(\"Student exercise: complete parameters.\")\n", " ###################################################################\n", "\n", " A = np.random.uniform(min_A, max_A, size = 1)\n", " B = np.random.uniform(min_B, max_B, size = 1)\n", " phi = np.random.uniform(..., ..., size = 1)\n", " C = np.random.uniform(..., ..., size = 1)\n", "\n", " #`replace = False` is important flag here as we don't want repeated data\n", " inner_sampled_days = np.expand_dims(np.random.choice(self.days, size = self.num_samples, replace = False), 1)\n", "\n", " if self.also_sample_outer:\n", "\n", " #we don't want inner and outer data to overlap\n", " outer_sampled_days = np.expand_dims(np.random.choice(np.setdiff1d(self.days, inner_sampled_days), size = self.num_samples, replace = False), 1)\n", "\n", " return inner_sampled_days, A * inner_sampled_days ** 2 + B * np.sin(np.pi * inner_sampled_days + phi) + C, outer_sampled_days, A * outer_sampled_days ** 2 + B * np.sin(np.pi * outer_sampled_days + phi) + C\n", "\n", " return inner_sampled_days, A * inner_sampled_days ** 2 + B * np.sin(np.pi * inner_sampled_days + phi) + C\n", "\n", " def sample_particular_task(self, A, B, phi, C, num_samples):\n", " \"\"\"Samples for the particular instance of the task defined by the tuple of parameters (A, B, phi, C) and `num_samples`.\"\"\"\n", "\n", " sampled_days = np.expand_dims(np.random.choice(self.days, size = num_samples, replace = False), 1)\n", " return sampled_days, A * sampled_days ** 2 + B * np.sin(np.pi * sampled_days + phi) + C\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "class FruitSupplyDataset(Dataset):\n", " def __init__(self, num_epochs = 1, num_tasks = 1, num_samples = 1, days = days, also_sample_outer = True):\n", " \"\"\"Initialize particular instance of `FruitSupplyDataset` dataset.\n", "\n", " Inputs:\n", " - num_epochs (int): Number of epochs the model is going to be trained on.\n", " - num_tasks (int): Number of tasks to sample for each epoch (the loss and improvement is going to be represented as sum over considered tasks).\n", " - num_samples (int): Number of days to sample for each task.\n", " - days (np.ndarray): Summer and autumn days to sample from.\n", " - also_sample_outer (bool): `True` if we want to sample inner and outer data (necessary for training).\n", "\n", " Raises:\n", " - ValueError: If the number of sampled days `num_samples` exceeds number of days to sample from.\n", " \"\"\"\n", "\n", " if also_sample_outer:\n", " if num_samples > days.shape[0] // 2:\n", " raise ValueError(\"Number of sampled days for one task should be less or equal to the total amount of days divided by two as we sample inner and outer data.\")\n", " else:\n", " if num_samples > days.shape[0]:\n", " raise ValueError(\"Number of sampled days for one task should be less or equal to the total amount of days.\")\n", "\n", " #total amount of data is (2/4 x num_epochs x num_tasks x num_samples) (2/4 because -> x_inner, x_outer, y_inner, y_outer; outer is optional)\n", " self.num_epochs = num_epochs\n", " self.num_tasks = num_tasks\n", " self.num_samples = num_samples\n", " self.also_sample_outer = also_sample_outer\n", " self.days = days\n", "\n", " def __len__(self):\n", " \"\"\"Calculate the length of the dataset. It is obligatory for PyTorch to know in advance how many samples to expect (before training),\n", " thus we enforced to icnlude number of epochs and tasks per epoch in `FruitSupplyDataset` parameters.\"\"\"\n", "\n", " return self.num_epochs * self.num_tasks\n", "\n", " def __getitem__(self, idx):\n", " \"\"\"Generate particular instance of task with prefined number of samples `num_samples`.\"\"\"\n", "\n", " A = np.random.uniform(min_A, max_A, size = 1)\n", " B = np.random.uniform(min_B, max_B, size = 1)\n", " phi = np.random.uniform(min_phi, max_phi, size = 1)\n", " C = np.random.uniform(min_C, max_C, size = 1)\n", "\n", " #`replace = False` is important flag here as we don't want repeated data\n", " inner_sampled_days = np.expand_dims(np.random.choice(self.days, size = self.num_samples, replace = False), 1)\n", "\n", " if self.also_sample_outer:\n", "\n", " #we don't want inner and outer data to overlap\n", " outer_sampled_days = np.expand_dims(np.random.choice(np.setdiff1d(self.days, inner_sampled_days), size = self.num_samples, replace = False), 1)\n", "\n", " return inner_sampled_days, A * inner_sampled_days ** 2 + B * np.sin(np.pi * inner_sampled_days + phi) + C, outer_sampled_days, A * outer_sampled_days ** 2 + B * np.sin(np.pi * outer_sampled_days + phi) + C\n", "\n", " return inner_sampled_days, A * inner_sampled_days ** 2 + B * np.sin(np.pi * inner_sampled_days + phi) + C\n", "\n", " def sample_particular_task(self, A, B, phi, C, num_samples):\n", " \"\"\"Samples for the particular instance of the task defined by the tuple of parameters (A, B, phi, C) and `num_samples`.\"\"\"\n", "\n", " sampled_days = np.expand_dims(np.random.choice(self.days, size = num_samples, replace = False), 1)\n", " return sampled_days, A * sampled_days ** 2 + B * np.sin(np.pi * sampled_days + phi) + C" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now, let us visually inspect three distinct tasks from the defined task space. Explore even more of the tasks by changing the seed value in the slider!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make sure you execute this cell to observe the plot!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Make sure you execute this cell to observe the plot!\n", "\n", "@widgets.interact(seed=widgets.IntSlider(value=42, description='Seed:', continuous_update=False))\n", "def generate_tasks(seed):\n", " set_seed(seed)\n", " example_fruit_dataset = FruitSupplyDatasetComplete(num_epochs=1, num_samples=183, num_tasks=3, also_sample_outer=False)\n", " example_fruit_dataloader = DataLoader(example_fruit_dataset, batch_size=3)\n", " task_days, task_prices = next(iter(example_fruit_dataloader))\n", " plot_tasks(task_days, task_prices)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 1 Discussion\n", "\n", "1. Do you think these particular tasks are similar? Do you expect the model to learn their general nature?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "#to_remove explanation\n", "\n", "\"\"\"\n", "Discussion: Do you think these particular tasks are similar? Do you expect the model to learn their general nature?\n", "\n", "Though being pretty distinct visually, they share joint underlying dynamics - fast oscillation modeled by\n", "sinusoid and general increasing trend with the quadratic term; we expect the model to learn these patterns\n", "and to adapt quickly to the particular case.\n", "\"\"\";" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_task_space\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "\n", "# Section 2: Meta-training\n", "\n", "*Estimated timing to here from start of tutorial: 20 minutes*\n", "\n", "In this section, we are ready to train the model to generalize well across the tasks in this set. First, we will present the training procedure, discuss the intuition behind why it works, and then complete code snippets to do our first meta-learning training." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercise 2: All about training\n", "\n", "During meta-learning, a task $\\tau_{i}$ is sampled from $p(\\tau)$. The model is trained with $K$ samples from this task, and the loss $\\mathcal{L}_{\\tau_i}$ from $\\tau_{i}$ (which will be MSE for these tasks) is calculated, and then tested on new samples from $\\tau_{i}$. The model is then improved by considering how the test error on new data from $\\tau_{i}$ changes with respect to its base parameters. The process can be described via the following pseudo-code:\n", "\n", "\n", "\\begin{align*}\n", "& [1] \\: \\text{for new epoch do} \\\\\n", "& [2] \\: \\quad\\quad\\text{Sample batch of tasks }\\tau_i \\sim p(\\tau) \\\\\n", "& [3] \\: \\quad\\quad\\text{for all }\\tau_i \\text{ do} \\\\\n", "& [4] \\: \\quad\\quad\\quad\\quad\\text{Sample }K\\text{ datapoints } \\{x^{(j)}, y^{(j)}\\}\\text{ from }\\tau_i \\\\\n", "& [5] \\: \\quad\\quad\\quad\\quad\\text{Evaluate }\\nabla_\\theta \\mathcal{L}_{\\tau_i}(f_\\theta) \\\\\n", "& [6] \\: \\quad\\quad\\quad\\quad\\text{Compute adapted task-specific parameters with gradient descent:} \\: \\: \\theta_i' = \\theta - \\alpha \\nabla_\\theta \\mathcal{L}_{\\tau_i}(f_\\theta) \\\\\n", "& [7] \\: \\quad\\quad\\quad\\quad\\text{Sample new datapoints }\\{x^{(j)}, y^{(j)}\\}\\text{ from }\\tau_i\\text{ for the meta-update} \\\\\n", "& [8] \\: \\quad\\quad\\text{end for} \\\\\n", "& [9] \\: \\quad\\quad\\text{Update base parameters: }\\theta \\leftarrow \\theta - \\beta \\nabla_\\theta \\sum_{\\tau_i \\sim p(\\tau)} \\mathcal{L}_{\\tau_i}(f_{\\theta_i'})\\text{ using new sampled data and corresponding losses} \\\\\n", "\\end{align*}\n", "\n", "\n", "At first, we sample a bunch of tasks for the new epoch (line $[2]$). Then, for each task (line $[3]$), we perform the same operations. Lines $[4] - [6]$ correspond to the so-called \"inner\" loop (and, thus, sampled data in $[4]$ is \"inner\") - we calculate new task-specific parameters of the model for the particular instance of the task $\\tau_{i}$ by calculating the gradient with respect to the defined loss function (line $[5]$) and updating the task-specific component of the weights (line $[6]$). Then, we perform the \"outer\" loop and, thus, sample \"outer\" data from the very same tasks (as calculated for each $\\tau_{i}$ in line $[7]$) and then update the base parameters of the model based on the performances of the task-specific models (line $[9]$, the loss is the sum of losses for all tasks in the epoch of training).\n", "\n", "That's meta-learning in a nutshell! Now, we are ready to complete the meta-learning model functions. Let us discuss their main components to quickly understand what is going on under the hood:\n", "\n", "- `Inner_loop` - Takes inner data (days and prices) from a particular task as input calculates inner loss, and performs parameter calculations manually with one gradient descent step.\n", "- `Outer_loop` - Takes outer data (days and prices) and calculates parameters for this particular task from inner_loop to calculate outer (meta) loss.\n", "- `Train_tasks` - Iterates through all tasks in the epoch, performs inner and outer loops, accumulates meta-loss from outer data, and optimizes the model's base parameters.\n", "\n", "The `MetaLearningModel` class inherits from `UtilModel`, which is defined at the very top of this notebook as it contains utility functions that are not of particular interest to us. For the outer loop, we use the `Adam` optimizer initialized on `model.parameters()` (the architecture of the model is the same as before—two hidden layers with 100 units each; it is defined right after this code snippet). The learning rate is defined in `UtilModel` as `outer_learning_rate`. Moreover, to calculate the loss for the outer loop based on the updated task-specific parameters, there is a custom `manual_output` function in `UtilModel`, which simply propagates through the architecture of the MLP for given weights. For the inner loop, there is `inner_learning_rate` for performing one gradient descent step.\n", "\n", "Fill out the code below to calculate the loss and gradient updates for the inner loop." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "class MetaLearningModel(UtilModel):\n", " def __init__(self, *args, **kwargs):\n", " \"\"\" Implementation of MAML (Finn et al. 2017).\n", " \"\"\"\n", " super().__init__(*args, **kwargs)\n", "\n", " self.optimizer = optim.Adam(self.model.parameters(), lr = self.outer_learning_rate)\n", "\n", " def inner_loop(self, x_inner, y_inner):\n", " \"\"\" Compute parameters of the model in the inner loop but not update model parameters.\n", " \"\"\"\n", " ###################################################################\n", " ## Fill out the following then remove\n", " raise NotImplementedError(\"Student exercise: complete inner loop.\")\n", " ###################################################################\n", " #calculate loss for inner data\n", " loss = self.loss_fn(self.model(...), ...)\n", "\n", " #inner data gradients w.r.t calculated loss\n", " grads = torch.autograd.grad(loss, self.model.parameters())\n", "\n", " #update weights based on inner data gradients; observe that we don't update `self.model.parameters()` here\n", " parameters = [parameter - self.inner_learning_rate * grad for parameter, grad in zip(self.model.parameters(), grads)]\n", "\n", " return parameters\n", "\n", " def outer_loop(self, x_outer, y_outer, weights):\n", " \"\"\" Compute loss for outer dats with already calculated parameters on inner data.\n", " \"\"\"\n", " ###################################################################\n", " ## Fill out the following then remove\n", " raise NotImplementedError(\"Student exercise: complete outer loop.\")\n", " ###################################################################\n", " #observe that parameters come from inner data, while loss is calculated on outer one; it is the heart of meta-learning approach\n", " return self.loss_fn(self.manual_output(weights, ...), ...)\n", "\n", " def train_tasks(self, tasks):\n", " \"\"\" Utility method to train an entire epoch in one go. Implements the meta-update step for a batch of tasks.\n", " \"\"\"\n", " #prepare for loss accumulation\n", " metaloss = 0\n", " self.optimizer.zero_grad()\n", "\n", " #for visualization purposes\n", " model_parameters = [list(p.clone().detach().numpy() for p in self.model.parameters())[0]]\n", "\n", " #take inner and outer data from dataset\n", " x_inner, y_inner, x_outer, y_outer = tasks\n", "\n", " #apply normalization on days\n", " x_inner = (x_inner - self.mean) / self.std\n", " x_outer = (x_outer - self.mean) / self.std\n", "\n", " #for each task there is going to be inner and outer loop\n", " for task_idx in range(x_inner.shape[0]):\n", "\n", " #find weights (line [6])\n", " parameters = self.inner_loop(x_inner[task_idx].type(torch.float32), y_inner[task_idx].type(torch.float32))\n", "\n", " #for visualization purposes\n", " if task_idx < 10:\n", " model_parameters.append(parameters[0].detach().numpy())\n", "\n", " #find meta loss w.r.t to found weights in inner loop (line [9])\n", " task_metaloss = self.outer_loop(x_outer[task_idx].type(torch.float32), y_outer[task_idx].type(torch.float32), parameters)\n", "\n", " #contribute to metaloss\n", " metaloss += task_metaloss\n", "\n", " #update model parameters\n", " metaloss.backward()\n", " self.optimizer.step()\n", "\n", " #for visualization purposes\n", " model_parameters.append(list(p.clone().detach().numpy() for p in self.model.parameters())[0])\n", "\n", " return metaloss.item() / x_inner.shape[0], model_parameters\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "class MetaLearningModel(UtilModel):\n", " def __init__(self, *args, **kwargs):\n", " \"\"\" Implementation of MAML (Finn et al. 2017).\n", " \"\"\"\n", " super().__init__(*args, **kwargs)\n", "\n", " self.optimizer = optim.Adam(self.model.parameters(), lr = self.outer_learning_rate)\n", "\n", " def inner_loop(self, x_inner, y_inner):\n", " \"\"\" Compute parameters of the model in the inner loop but not update model parameters.\n", " \"\"\"\n", " #calculate loss for inner data\n", " loss = self.loss_fn(self.model(x_inner), y_inner)\n", "\n", " #inner data gradients w.r.t calculated loss\n", " grads = torch.autograd.grad(loss, self.model.parameters())\n", "\n", " #update weights based on inner data gradients; observe that we don't update `self.model.parameters()` here\n", " parameters = [parameter - self.inner_learning_rate * grad for parameter, grad in zip(self.model.parameters(), grads)]\n", "\n", " return parameters\n", "\n", " def outer_loop(self, x_outer, y_outer, weights):\n", " \"\"\" Compute loss for outer dats with already calculated parameters on inner data.\n", " \"\"\"\n", " #observe that parameters come from inner data, while loss is calculated on outer one; it is the heart of meta-learning approach\n", " return self.loss_fn(self.manual_output(weights, x_outer), y_outer)\n", "\n", " def train_tasks(self, tasks):\n", " \"\"\" Utility method to train an entire epoch in one go. Implements the meta-update step for a batch of tasks.\n", " \"\"\"\n", " #prepare for loss accumulation\n", " metaloss = 0\n", " self.optimizer.zero_grad()\n", "\n", " #for visualization purposes\n", " model_parameters = [list(p.clone().detach().numpy() for p in self.model.parameters())[0]]\n", "\n", " #take inner and outer data from dataset\n", " x_inner, y_inner, x_outer, y_outer = tasks\n", "\n", " #apply normalization on days\n", " x_inner = (x_inner - self.mean) / self.std\n", " x_outer = (x_outer - self.mean) / self.std\n", "\n", " #for each task there is going to be inner and outer loop\n", " for task_idx in range(x_inner.shape[0]):\n", "\n", " #find weights (line [6])\n", " parameters = self.inner_loop(x_inner[task_idx].type(torch.float32), y_inner[task_idx].type(torch.float32))\n", "\n", " #for visualization purposes\n", " if task_idx < 10:\n", " model_parameters.append(parameters[0].detach().numpy())\n", "\n", " #find meta loss w.r.t to found weights in inner loop (line [9])\n", " task_metaloss = self.outer_loop(x_outer[task_idx].type(torch.float32), y_outer[task_idx].type(torch.float32), parameters)\n", "\n", " #contribute to metaloss\n", " metaloss += task_metaloss\n", "\n", " #update model parameters\n", " metaloss.backward()\n", " self.optimizer.step()\n", "\n", " #for visualization purposes\n", " model_parameters.append(list(p.clone().detach().numpy() for p in self.model.parameters())[0])\n", "\n", " return metaloss.item() / x_inner.shape[0], model_parameters" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "In the function below, we define the model architecture, which will be passed as a parameter to the `MetaLearningModel`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class MLP(nn.Module):\n", " def __init__(self):\n", " \"\"\"Defines model's architecture.\"\"\"\n", " super(MLP, self).__init__()\n", " self.linear1 = nn.Linear(1, 100)\n", " self.relu1 = nn.ReLU()\n", " self.linear2 = nn.Linear(100, 100)\n", " self.relu2 = nn.ReLU()\n", " self.linear3 = nn.Linear(100, 1)\n", "\n", " def forward(self, x):\n", " \"\"\"Implements forward pass through defined layers of model.\"\"\"\n", " x = self.relu1(self.linear1(x))\n", " x = self.relu2(self.linear2(x))\n", " x = self.linear3(x)\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now that the classes and functions are built, we are ready to look at the training procedure. We first define an architecture, meta-learning model, dataset, and dataloader. Then, we iterate through the dataloader and call the `train_tasks` function on the new batch of tasks. Every 1000 epochs, we average the meta-loss and output it. At the end of the training, we save the model parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def train(name, num_epochs, num_tasks, num_samples):\n", " \"\"\"Performs training for the meta-learning model.\"\"\"\n", "\n", " # Initialize MLP\n", " model = MLP()\n", "\n", " # Initialize meta model\n", " meta_model = MetaLearningModel(model, mean=days_mean, std=days_std)\n", "\n", " # Initialize dataset and dataloader\n", " fruit_dataset = FruitSupplyDataset(num_epochs=num_epochs + 1, num_tasks=num_tasks, num_samples=num_samples, also_sample_outer=True)\n", " fruit_dataloader = DataLoader(fruit_dataset, batch_size=num_tasks)\n", "\n", " # Track loss history\n", " epoch_loss = []\n", "\n", " # For visualization purposes\n", " epoch_parameters = []\n", "\n", " # Initialize progress bar\n", " pbar = tqdm(total=num_epochs + 1, desc=\"Training Progress\")\n", "\n", " for epoch in range(num_epochs + 1):\n", "\n", " # Get new batch of tasks\n", " tasks = next(iter(fruit_dataloader))\n", "\n", " # Train meta model\n", " mean_batch_loss, model_parameters = meta_model.train_tasks(tasks)\n", "\n", " # Track loss history\n", " epoch_loss.append(mean_batch_loss)\n", "\n", " # For visualization purposes\n", " if epoch < 100:\n", " epoch_parameters.append(np.array(model_parameters).squeeze())\n", "\n", " # Update progress bar\n", " pbar.update(1)\n", "\n", " # Print current loss\n", " if (epoch + 1) % 1000 == 0:\n", " print(\n", " f\"Meta mean loss for epoch {(epoch + 1) // 1000}: {np.mean(epoch_loss[epoch - 999:epoch]):.03f}\"\n", " )\n", "\n", " # Close the progress bar\n", " pbar.close()\n", "\n", " # Save trained model\n", " meta_model.save_parameters(name + '.pt')\n", " return np.array(epoch_parameters)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "To achieve the required level of performance, set `num_epochs` in the cell below to 100000, which would take the model at least one hour to train. Thus, at the start of the tutorial, we presented you with the already trained model `SummerAutumnModel.pt`. Ensure you can see it in the same folder as your tutorial notebook.\n", "\n", "Below, we perform training for 1000 epochs so that you can still run your first meta-learning experiment and verify the correctness of the implementation!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "set_seed(42)\n", "epoch_parameters = train(\"SummerAutumnModelTest\", num_epochs = 1000, num_tasks = 30, num_samples = 40)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "You should be able to see that the meta-loss value is 0.123 for this case. Let us look at how the first layer weight values developed throughout the first 100 epochs. Here, we project this high-dimensional weight vector into a two-dimensional space using PCA. Play with the epoch value to see the (outer) evolution of the base weights (the left plot) and see how the (inner) task-specific weights change the total weight values in the given epoch (the right plot). For clarity of visualization, we use only the first ten tasks for each epoch in the inner weight visualizations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make sure you execute this cell to observe the plot!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Make sure you execute this cell to observe the plot!\n", "\n", "number_of_components = 2\n", "\n", "#reshape parameters for PCA\n", "reshaped_parameters = epoch_parameters.reshape(epoch_parameters.shape[0] * epoch_parameters.shape[1], epoch_parameters.shape[2])\n", "\n", "#define PCA\n", "pca = PCA(n_components = number_of_components)\n", "\n", "#perform PCA\n", "pca_parameters = pca.fit_transform(reshaped_parameters)\n", "\n", "#reshape parameters back\n", "pca_parameters = pca_parameters.reshape(epoch_parameters.shape[0], epoch_parameters.shape[1], number_of_components)\n", "\n", "@widgets.interact(epoch = widgets.IntSlider(value=1, min=1, max=100, continuous_update=False, description='Epoch:'))\n", "def plot_with_slider(epoch):\n", " plot_inner_outer_weights(pca_parameters, epoch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_all_about_training\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Think 2: Dreaming about a model\n", "\n", "We suggest you use your imagination for a bit and visualize what you expect the base parameters of the model to have learned. When you load the pre-trained model, it will be in a state where it hasn't updated its task-specific weights for a particular task. What would we expect its outputs to be in this state? Recall that the base parameters should somehow generalize the knowledge about the task family and still be robust for optimizing new tasks.\n", "\n", "Take 3 minutes to draw predictions on your own, then discuss your pictures in a group. Then check it yourself right below this cell (you need to uncomment the relevant lines)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "#make sure you have downloaded model in your files\n", "model_path = 'SummerAutumnModel.pt'\n", "meta_model = MetaLearningModel(model = model_path, mean = days_mean, std = days_std)\n", "prediction = meta_model.inference(torch.tensor(np.expand_dims(days, 1)))\n", "\n", "## UNCOMMENT TO SEE THE PLOT ##\n", "\n", "# with plt.xkcd():\n", "# plt.plot(days, prediction.detach().numpy(), label = \"Meta Knowledge\")\n", "# plt.xlabel('Week')\n", "# plt.ylabel('Price')\n", "# plt.legend()\n", "# plt.show()\n", "\n", "## UNCOMMENT TO SEE THE PLOT ##" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Discuss**\n", "\n", "Does the result match your drawings?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_dream_about_model\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Section 3: Adapt to a particular task" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "*Estimated timing to here from start of tutorial: 35 minutes*\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercise 3: Fine-tuning model\n", "\n", "Now, we would like to evaluate the learning ability of the meta-learned model by fine-tuning it to a particular task. For that, we clone the model parameters and perform a couple of gradient descent steps on a small amount of data for two specific tasks:\n", "\n", "$$A = 0.005, B = 0.1, \\phi = 0, C = 1$$\n", "$$A = -0.005, B = 0.1, \\phi = 0, C = 4$$\n", "\n", "Observe the second set of parameters. The model hasn't been exposed to negative values of $A$ in any of the tasks during training. We will see whether it has still learned enough to generalize well!\n", "\n", "We will test this by completing the `finetune` function, which takes the `model`, data (`x_finetune`, `y_finetune`) sampled from the particular task, and the number of gradient descent steps to take (`finetune_gradient_steps`). We will use the Stochastic Gradient Descent optimizer to perform these steps on the cloned base parameters so that we preserve the meta-learned ones." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "def finetune(model, x_finetune, y_finetune, finetune_gradient_steps):\n", " \"\"\"\n", " Take a fixed number of gradient steps for the given x_finetune and y_finetune.\n", "\n", " Inputs:\n", " - model (MetaLearningModel): trained meta learning model.\n", " - x_finetune (torch.tensor): features (days) of the specific task.\n", " - y_finetune (torch.tensor): outcomes (prices) of the specific task.\n", " - finetune_gradient_steps (int): number of gradient steps to perform for this task.\n", " \"\"\"\n", " #apply normalization on days\n", " x_finetune = (x_finetune - model.mean) / model.std\n", "\n", " #need to create clone, so that we preserve meta-learnt parameters\n", " clone = model.deep_clone_model(model.model)\n", " optimizer = optim.SGD(clone.parameters(), lr = model.inner_learning_rate)\n", "\n", " ###################################################################\n", " ## Fill out the following then remove\n", " raise NotImplementedError(\"Student exercise: complete fine-tuning procedure.\")\n", " ###################################################################\n", "\n", " for _ in range(finetune_gradient_steps):\n", " optimizer.zero_grad()\n", " loss = model.loss_fn(clone(...), ...)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " return clone\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "def finetune(model, x_finetune, y_finetune, finetune_gradient_steps):\n", " \"\"\"\n", " Take a fixed number of gradient steps for the given x_finetune and y_finetune.\n", "\n", " Inputs:\n", " - model (MetaLearningModel): trained meta learning model.\n", " - x_finetune (torch.tensor): features (days) of the specific task.\n", " - y_finetune (torch.tensor): outcomes (prices) of the specific task.\n", " - finetune_gradient_steps (int): number of gradient steps to perform for this task.\n", " \"\"\"\n", " #apply normalization on days\n", " x_finetune = (x_finetune - model.mean) / model.std\n", "\n", " #need to create clone, so that we preserve meta-learnt parameters\n", " clone = model.deep_clone_model(model.model)\n", " optimizer = optim.SGD(clone.parameters(), lr = model.inner_learning_rate)\n", "\n", " for _ in range(finetune_gradient_steps):\n", " optimizer.zero_grad()\n", " loss = model.loss_fn(clone(x_finetune), y_finetune)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " return clone" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The `plot_performance` function used below implements sampling and fine-tuning on sampled data under the hood as well as predicting for all days after that." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def plot_performance(task, meta_model, gradient_steps, num_samples_finetune):\n", " \"\"\"\n", " Plot the predictions of prices on fine-tuned model.\n", "\n", " Inputs:\n", " - task (list): Parameters of the task to be sample from.\n", " - meta_model (MetaLearningModel): An instance of meta-learning model initialized from the parameters file and over which fine-tuning will take place.\n", " - gradient_steps (int): number of steps to perform gradient descent.\n", " - num_samples_finetune (int) number of samples.\n", " \"\"\"\n", "\n", " #sample data from the desired task\n", " dataset = FruitSupplyDataset(also_sample_outer = False)\n", " x_finetune, y_finetune = dataset.sample_particular_task(*task, num_samples_finetune)\n", "\n", " with plt.xkcd():\n", " prices = task[0] * days ** 2 + task[1] * np.sin(np.pi * days + task[2]) + task[3]\n", " plt.plot(days, prices, label = \"True function\")\n", "\n", " #we finetune on sampled (x_finetune, y_finetune) but calculate predictions on all days\n", " #`finetune_complete` is already completed version of `finetune`\n", " prediction = finetune_complete(meta_model, torch.tensor(x_finetune).type(torch.float32), torch.tensor(y_finetune).type(torch.float32), gradient_steps)(torch.tensor((np.expand_dims(days, 1) - days_mean) / days_std).type(torch.float32)).detach().numpy()\n", "\n", " print(f\"R-squared value is {r2_score(prices, prediction):.02f}\")\n", "\n", " plt.plot(days, prediction, label=f'{gradient_steps} gradient steps')\n", " plt.title(f'Fine-tuned model on {num_samples_finetune} samples')\n", " plt.xlabel('Week')\n", " plt.ylabel('Price')\n", " plt.legend()\n", " plt.show()\n", "\n", "def visualize(name, gradient_steps, num_samples_finetune):\n", " \"\"\"Performs fine-tuning for a couple of tasks in a row.\n", "\n", " Inputs:\n", " - name (str): Name of the model's file.\n", " - gradient_steps (int): number of steps to perform gradient descent.\n", " - num_samples_finetune (int) number of samples.\n", " \"\"\"\n", "\n", " model_path = name + '.pt'\n", " meta_model = MetaLearningModel(model = model_path, mean = days_mean, std = days_std)\n", "\n", " tasks = [[0.005, 0.1, 0.0, 1.0], [-0.005, 0.1, 0.0, 4.0]]\n", "\n", " for task in tasks:\n", " fig = plot_performance(task, meta_model, gradient_steps, num_samples_finetune)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "set_seed(42)\n", "visualize(\"SummerAutumnModel\", gradient_steps = 10, num_samples_finetune = 30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_finetuning_model\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Interactive Demo 1: Number of data points & gradient steps\n", "\n", "We've seen how the meta-learning procedure created a model that could quickly adapt to new tasks (including the inverted curve!). In the above examples, we used 30 samples from each task and 10 gradient steps. In this interactive demo, we invite you to explore the reasonable amount of data points to sample from a particular task and the number of gradient steps to make to fine-tune the model's performance.\n", "\n", "What is the relationship between these two fine-tuning parameters and performance? Have you noticed any lower bounds for any of the parameters such that below these thresholds, the performance can't be expected to be great?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make sure you execute this cell to observe the widget!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Make sure you execute this cell to observe the widget!\n", "\n", "set_seed(42)\n", "\n", "@widgets.interact\n", "def interactive_visualize(\n", " gradient_steps = widgets.IntSlider(description=\"Gradient Steps\", min=0, max=20, step=1, value=10), num_samples = widgets.IntSlider(description=\"Samples\", min=5, max=150, step=5, value=100)):\n", " visualize(\"SummerAutumnModel\", gradient_steps, num_samples)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The observed interactive plots are summarized in the following 3D visualization, where the number of gradient steps and sample size are plotted against the R-squared score achieved on these particular values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make sure you execute this cell to observe the plot!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Make sure you execute this cell to observe the plot!\n", "\n", "set_seed(42)\n", "\n", "gradient_steps = np.arange(11)\n", "num_samples = np.arange(5, 155, 5)\n", "plot_sensitivity_r_squared(\"SummerAutumnModel\", gradient_steps, num_samples)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Notice that the z-axis (R-squared score) has a different scale for the plots (for the positive squared term constant, it starts with 0.84 while for the negative one, with -0.75). For a better comparison between these two cases, let us fix the scale as in a positive plot, but it means that we are going to lose some data points for the negative plot." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make sure you execute this cell to observe the plot!\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Make sure you execute this cell to observe the plot!\n", "\n", "set_seed(42)\n", "\n", "gradient_steps = np.arange(11)\n", "num_samples = np.arange(5, 155, 5)\n", "plot_sensitivity_r_squared(\"SummerAutumnModel\", gradient_steps, num_samples, fix_scale = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_number_of_data_points_and_gradient_steps\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "*Estimated timing of tutorial: 50 minutes*\n", "\n", "A summary of what we've learned:\n", "\n", "1. Meta-learning aims to create a model that can quickly learn any task from a distribution.\n", "\n", "2. To achieve this, it uses inner and outer learning loops to find base parameters from which a small number of gradient steps can lead to good performance on any task." ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W2D4_Tutorial3", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.19" } }, "nbformat": 4, "nbformat_minor": 4 }