Part 1 - Predicting Review Scores on Pitchfork

For Part 1, we will be using data from this paper. The data is a collection of reviews from Pitchfork, a site that provides expert reviews of music album. The authors of this paper have also combined the data with a set of features from Spotify’s API that provide insight into the music itself, e.g. the "acousticness" of the song. We will tackle a regression problem here, trying to predict the score of a review from several of the other columns in the dataset.

Part 1.1 - Feature Engineering with Feature Subsets

In the first subsection of Part 1, We’re going to look at how running linear regression with various subsets of our features impacts our ability to predict score.

In Part 1.1, Here we are going to train a separate linear regression model for a number of different feature subsets. Specifically:

Our output file part_1.1_results.csv will have the following columns:

Part 1.2 - Feature Engineering with the LASSO

In Part 1.2, We will be training an L1-regularized linear regression model, with an expanded feature set. Specifically:

  1. Begin with the final feature set listed in feature_sets (i.e. your feature set, to begin this section, is feature_sets[-1].
  2. One-hot encode your categorical variables, setting drop=if_binary and sparse=False in the function arguments.
  3. Scale all of your continuous features using the StandardScaler.
  4. Train an L1-regularized linear regression model using these features on the dataset part1_train.csv. You should use the LassoCV class in sklearn, it will do the cross-validation necessary to select the appropriate value for the regularizer for you! Use 10-fold cross-validation to perform model selection (set the LassoCV parmaeter cv to 10), and set the random_state to 1. Do not change any of the other parameters to LassoCV (i.e. leave them at their defaults).
  5. Identify the best alpha value (the regularizer term, according to sklearn. In class, we refer to this as $\lambda$!) in terms of average mean squared error according to the cross-validation.
  6. Finally, train a Lasso model on the entire training dataset (part1_train.csv). We will use this to report the root mean squared error on the test set.

Part 1.4 - "Manual" Cross-Validation + Holdout for Model Selection and Evaluation

We will finally use cross validation for both algorithm and model selection, with a hold-out test set for a final evaluation. We will use 5-fold cross validation to identify the best parameters and hyperparameters for a set of models. We will then take our final models and use a final hold-out test set (the same one as above) to estimate the generalization error of the models.

Specifically, We will be training and evaluating the following models, one for each of the specified hyper parameters sets:

Our output file part_1.4_results.csvshould have the following columns:

Part 2

Here, we're going to perform optimization of one of the classification models - logistic regression. As a reminder...

The loss function of logistic regression (also known as the logistic-loss or log-loss) is given by: \begin{equation} J({\bf w}) = \frac{1}{n}\sum_{i=1}^n \log{(1 + \exp{(-y_i{\bf w}^\top{\bf x}_i}))} \label{eqn:logloss} \end{equation}

The gradient for this loss function, as derived in class, is: \begin{equation} \nabla J({\bf w}) = -\frac{1}{n}\sum_{i=1}^n \frac{y_i}{1 + \exp{(y_i{\bf w}^\top{\bf x}_i)}}{\bf x}_i \label{eqn:loglossgradient} \end{equation}

The Hessian for the loss function is given by: \begin{equation} {\bf H}({\bf w}) = \frac{1}{n} \sum_{i=1}^n \frac{\exp{(y_i{\bf w}^\top{\bf x}_i)}}{(1 + \exp{(y_i{\bf w}^\top{\bf x}_i)})^2}{\bf x}_i{\bf x}_i^\top \label{eqn:loglosshessian} \end{equation}

Part 2.1 - Logistic Regression with Gradient Descent

In Part 2.1 we will implement logistic regression with gradient descent.

  1. logistic_objective - compute the logistic loss for the given data set (see equation above)
  2. logistic_gradient - compute the gradient vector of logistic loss for the given data set (see equation above)
  3. run_gradient_descent - run the gradient descent algorithm, given these two functions.

Part 2.2 - Optimization with Newton-Raphson

In Part 2.2, we are going to use the Newton-Raphson method to optimize the same logistic regression model. To do so, we will need to 1) implement the logistic_hessian function to compute the Hessian matrix of logistic loss for the given data set, and 2) use scipy's optimize function to perform the optimization, rather than writing a function by hand to do so.