\documentclass{article}
\usepackage{fullpage}
\usepackage{color}
\usepackage{amsmath}
\usepackage{url}
\usepackage{verbatim}
\usepackage{graphicx}
\usepackage{parskip}
\usepackage{amssymb}
\usepackage{nicefrac}
\usepackage{listings} % For displaying code
\usepackage{algorithm2e} % pseudo-code
% Answers
\def\ans#1{\par\gre{Answer: #1}}
%\def\ans#1{} % Comment this line to produce document with answers
% Colors
\definecolor{blu}{rgb}{0,0,1}
\def\blu#1{{\color{blu}#1}}
\definecolor{gre}{rgb}{0,.5,0}
\def\gre#1{{\color{gre}#1}}
\definecolor{red}{rgb}{1,0,0}
\def\red#1{{\color{red}#1}}
\def\norm#1{\|#1\|}
% Math
\def\R{\mathbb{R}}
\def\E{\mathbb{E}}
\def\argmax{\mathop{\rm arg\,max}}
\newcommand{\argmin}[1]{\mathop{\hbox{argmin}}_{#1}}
\newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}}
\newcommand{\alignStar}[1]{\begin{align*}#1\end{align*}}
\def\half{\frac 1 2}
\def\cond{\; | \;}
% LaTeX
\newcommand{\fig}[2]{\includegraphics[width=#1\textwidth]{a4f/#2}}
\newcommand{\centerfig}[2]{\begin{center}\includegraphics[width=#1\textwidth]{a4f/#2}\end{center}}
\def\items#1{\begin{itemize}#1\end{itemize}}
\def\enum#1{\begin{enumerate}#1\end{enumerate}}
\begin{document}
\title{CPSC 540 Assignment 4 (due Friday March 27 at midnight)}
\author{}
\date{}
\maketitle
\vspace{-4em}
%\red{We are providing solutions because supervised learning is easier than unsupervised learning, and so we think having solutions available can help you learn. However, the solution file is meant for you alone and we do not give permission to share these solution files with anyone. Both distributing solution files to other people or using solution files provided to you by other people are considered academic misconduct. Please see UBC's policy on this topic if you are not familiar with it:\\
%\url{http://www.calendar.ubc.ca/vancouver/index.cfm?tree=3,54,111,959}\\
%\url{http://www.calendar.ubc.ca/vancouver/index.cfm?tree=3,54,111,960}}
\blu{\enum{
\item Name(s):
\item Student ID(s):
}}
\section{Bayesian Inference}
Consider a $y \in \{1,2,3,\dots\}$ following a Poisson distribution with rate parameter $\lambda > 0$,
\[
p(y \cond \lambda) = \frac{\lambda^y\exp(-\lambda)}{y!}.
\]
We'll assume that $\lambda$ follows a Gamma distribution (the conjugate prior to the Poisson) with shape parameter $\alpha > 0$ and rate parameter $\beta > 0$,
\[
\lambda \sim \text{Gamma}(\alpha, \beta),
\]
or equivalently that
\[
p(\lambda \cond \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}\lambda^{\alpha-1}\exp(-\beta\lambda),
\]
where $\Gamma$ is the gamma function.
Compute the following quantites:
\enum{
\item \blu{The posterior distribution,
\[
p(\lambda \cond y,\alpha, \beta).
\]
}
\item \blu{The marginal likelihood of $y$ given the hyper-parameters $\alpha$ and $\beta$,
\[
p(y\cond \alpha, \beta) = \int p(y,\lambda\cond \alpha, \beta)d\lambda.
\]
}
\item \blu{The posterior mean estimate for $\lambda$,
\[
\mathbb{E}_{\lambda\cond y,\alpha,\beta}[\lambda] = \int \lambda p(\lambda \cond y,\alpha,\beta)d\lambda.
\]}
\item \blu{The posterior predictive distribution for a new independent observation $\tilde{y}$ given $y$,
\[
p(\tilde{y}\cond y,\alpha,\beta) = \int p(\tilde{y},\lambda\cond y,\alpha,\beta)d\lambda.
\]
}
where $\alpha^{++} = \tilde{y}+\alpha^+$ and $\beta^{++} = 1+\beta^+$.
Note that this is the marginal likelihood of the new data, if we treat the posterior we got from the old data as our prior (which is intuitive if you think about it).
}
Hint:
You should be able to use the form of the gamma distribution to solve all the integrals that show up in this question. You can use $Z(\alpha, \beta) = \frac{\Gamma(\alpha)}{\beta^\alpha}$ to represent the normalizing constant of the gamma distribution, and use $\alpha^+$ and $\beta^+$ as the updated parameters of the gamma distribution in the posterior.
\section{Markov Chain Monte Carlo}
If you run \emph{example\_MH.jl}, it loads a set of images of `2' and `3' digits. It then runs the Metropolis MCMC algorithm to try to generate samples from the posterior over $w$, in a logistic regression model with a Gaussian prior. Once the samples are generated, it makes a histogram of the samples for several of the variables.\footnote{The ``positive'' variables are some of the positive weights when you fit an L2-regularized logistic regression model to the this data. The ``negative'' variables are some of the negative regression weights in that model, and the ``neutral'' ones are set to 0 in that model.}
\enum{
\item \blu{Why would the samples coming from the Metropolis algorithm not give a good approximation to the posterior?}
\item Modify the proposal used by the demo to $\hat{w} \sim \mathcal{N}(w,\red{(1/100)} I)$ instead of $\hat{w} \sim \mathcal{N}(w,I)$. \blu{Hand in your code and the update histogram plot.}
\item Modify the proposal to use $\hat{w} \sim \mathcal{N}(w,\red{(1/10000)} I)$. \blu{Do you think this performs better or worse than the previous choice? (Briefly explain.)}
}
\section{Deep Structured Models}
Cancelled.
\section{Very-Short Answer Questions}
Give a short and concise 1-sentence answer to the below questions.
\enum{
\item Under what conditions can we use alpha-beta swap moves for approximate decoding?
\item Why do we use the parameterization $\phi_j(s) = \exp(w_{j,s})$ in UGMs?
\item What is the key difference between Younes' algorithm are our usual setting of using an unbiased approximation of the gradient?
\item What is the key advantage of the graph structure in restricted Boltzmann machines?
\item What is the advantage of using a CRF, modeling $p(y \cond x)$, rather than treating supervised learning as special case of density estimation (modeling $p(y, x)$).
\item Why can fully-convolutional networks segment images of different sizes?
\item What is the key feature of a ``sequence to sequence'' RNN?
\item What are two advantages of the Bayesian approach to learning?
\item What is the difference between the posterior and posterior predictive distributions?
\item What is a hyper-hyper-parameter?
\item What is the key property of a conjugate prior?
\item In what setting is it unnecessary to include the $q$ function in the Metropolis-Hastings acceptance probability?
}
\section*{Literature Survey}
Reading academic papers is a skill that takes practice. When you first start out reading papers, you may find that you need to re-read things several times before you understand them, or that details will still be very fuzzy even after you've put a great amount of effort into trying to understand a paper. Don't panic, this is normal.
Even if you are used to reading papers from your particular sub-area, it can be challenging to read papers about a completely different topic. Usually, people in different areas use different language/notation and focus on different issues. Nevertheless, many of the most-successful people in academia and industry are those that are able to understand/adapt ideas from different areas. (There are a ton of smart people in the world working on all sorts of amazing things, it's good to know how to communicate with as many of them as possible.)
A common technique when trying to understand a new topic (or reading scientific papers for the first time) is to read and write notes on 10 papers on the topic. When you read the first paper, you'll often find that it's hard to follow. This can make reading through it take a long time and might still leave you feeling that many things don't make sense; keep reading and trying to take notes. When you get to the second paper, it might still be very hard to follow. But when you start getting to the 8th or 9th paper, things often start making more sense. You'll start to form an impression of what the influential works in the area are, you'll start getting to used to the language and jargon, you'll start to understand what the main issues that people who work on the topic care about, and you'll probably notice some important references that weren't on your initial list of 10 papers. Ideally, you'll also start to notice how the topic has changed over time and you may get ideas of future work that you could do on the topic.
To help you make progress on your project or to give you an excuse to learn about a new topic, for this part you should \blu{write a literature survey of at least 10 academic papers} on a particular topic. While your personal notes on the papers may be longer, the survey should be \blu{at most 4 pages of text (excluding references/tables/figures)} in a format similar to the one for this document. Some logical components of a literature survey might be:
\items{
\item A description of the overall topic, and the key themes/trends across the papers.
\item A short high-level description of what was explored in each paper. For example, describe the problem being addressed, the key components of the proposed solution, and how it was evaluated. In addition, it is important to comment on the \emph{why} questions: why is this problem important and why would this particular solution method make progress on it? It's also useful to comment on the strengths and weaknesses of the various works, and it's particularly nice if you can show how some works address the weaknesses of prior works (or introduce new weaknesses).
\item One or more logical ``groupings'' of the papers. This could be in terms of the variant of the topic that they address, in terms of the solution techniques used, or in chronological terms.
}
Some advice on choosing the topic:
\items{
\item The most logical/easy topic for your literature survey is a topic related to your course project, given that your final report will need a (shorter) literature survey included.
\item If you are an undergrad, or a masters student without a research project yet, you may alternately want to choose a general area (like variance-reduced stochastic gradient, non-Gaussian graphical models, recurrent neural networks, matrix factorization, neural artistic style transfer, Bayesian optimization, transformer networks, etc.) as your topic.
\item If you are a masters student that already has a thesis project, it could make sense to do a survey on a topic where ML intersects with your thesis (or where ML \emph{could} intersect your thesis).
\item If you are a PhD student, I would recommend using this an excuse to learn about a \emph{completely different} topic than what you normally work on. Choose something hard that you would like to learn about, but previously haven't been able to justify spending the time exploring carefully. This can be invaluable to your future research, because during/after your PhD it often becomes hard to allocate time to learn completely new topics.
}
\end{document}