The first line \(y_{i} \sim \text{Normal}(\mu, \sigma)\) is the likelihood. The second line is very similar, but is instead the prior for the parameter \(μ\). The third line is the prior for the parameter \(σ\). Likelihoods and priors can look very similar, because a likelihood is effectively a prior for the residuals.
1.2 4E2
In the model definition just above, how many parameters are in the posterior distribution?
There are two parameters in the posterior distribution \(\mu\) and \(\sigma\).
1.3 4E3
Using the model definition above, write down the appropriate form of Bayes’ theorem that includes the proper likelihood and priors.
Ignoring the specific distributions for the moment:
The second line \(\mu_{i} = \alpha + \beta x_i\) is the linear model.
1.5 4E5
In the model definition just above, how many parameters are in the posterior distribution?
There are three parameters in the posterior distribution: \(\alpha, \beta, \sigma\). The symbol \(\mu\) is no longer a parameter in the posterior, because it is entirely determined by \(\alpha, \beta\) and \(x\)
1.6 4M1
For the model definition below, simulate observed \(y\) values from the prior (not the posterior).
To sample from the prior distribution of y, we use stats.norm.rvs to simulate, while averaging over the prior distributions of \(μ\) and \(σ\). The easiest way to do this is to sample from the priors and then pass those samples to stats.norm.rvs to simulate y. This code will sample from the priors:
Note that the prior distribution of y is centered on zero. If that strikes you as odd, remember that a continuous variable like y can always be re-centered to any value you like, without changing any of the information in the data.
1.7 4M2
Translate the model just above into a Stan model.
m4_4m1 ='''data { int<lower=0> N;}generated quantities { real mu = normal_rng(0, 10); real sigma = exponential_rng(1); real y_sim = normal_rng(mu, sigma);}'''m4_4m1_model = utils.Stan('stan_models/m4_4m1', m4_4m1)y_sim = m4_4m1_model.laplace_sample(data={'N': 1}, draws=int(1e4)).stan_variable('y_sim')def plot_dens(): bw = utils.bw_nrd0(y_sim) az.plot_kde(y_sim, bw=bw*0.5) plt.xlabel(r"Sample $y$") plt.ylabel("Density")plot_dens()
1.8 4M3
Translate the quap model formula below into a mathematical model definition.
y ~dnorm( mu , sigma ),mu <- a + b*x,a ~dnorm( 0 , 10 ),b ~dunif( 0 , 1 ),sigma ~dexp( 1 )
A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.
Where \(h\) is height and \(y\) is year. These priors aren’t great, but they’ll do. The prior on the intercept \(α\) is effectively uninformative. The problem didn’t say what scale height is measured on, so it’s hard to do much else here. The prior on \(β\) is very weakly informative, centered on zero, which corresponds to no impact of year. This doesn’t seem like a great prior, because surely height increases each year or stays the same. So maybe a uniform prior above zero would be better?
\(\mu = \alpha + \beta y\) <- Assume the mean has some linear dependence on time.
\(\beta \sim \text{Log-Normal}(0,1)\) <- It seems perfectly sensible that the amount an average student’s height changes per year should only be positive. Additionally, I have no idea what the growth rate is so this encodes ignorance about the rate of change of height per year.
There are other options, including truncated distributions or log-Normal distributions, but this hasn’t been introduced yet.
Because height is centered, \(\alpha\) represents the average height in the average year (i.e., year 2). The prior of \(\text{Normal}(100, 10)\) was chosen assuming that height is measured in centimeters and that that sample is of children who are still growing.
The slope is extremely vague. The a prior centered on zero, and the standard deviation of the prior of 10 represents a wide range of possible growth (or shrinkage). During growth spurts, height growth averages 6–13 cm/year. The standard deviation of 10 encompasses the range we might expect to see if growth were occurring at a high rate.
Finally, the exponential prior on \(\sigma\) assumes an average deviation of 1.
Prior predictive simulations also appear to give reasonably plausible regression lines, given our current assumptions.
Now suppose I remind you that every student got taller each year. Does this information lead you to change your choice of priors? How?
This one is subtle. On the one hand, having information on measurement scale lets us set a more sensible prior for the intercept, \(α\). You might center it on 120 now, for example. On the other hand, for the prior to really be “prior,” it needs to be ignorance of the actual data values. Since the observed mean is a feature of the sample, not of the measurement scale, basing the prior on it could get you into trouble. What kind of trouble? It could lead to an illusionary fit to data, as essentially you’ve used the data twice: once in setting the prior and once in conditioning on the data (computing the posterior).
Much later, we’ll see how constraints on the data can be used to design models, without running the risk of using the data twice.
Because we know that an increase in year will always lead to increased height, we know that \(\beta\) will be positive. Therefore, our prior should reflect this by using, for example, a log-normal distribution.
\[
\beta \sim \text{Log-Normal}(1,0.5)
\]
This prior gives an expectation of about 3cm per year, with the 89% highest density interval between 0.87cm and 5.18cm per year.
Prior predictive simulations of plausible lines using this new log-normal prior indicate that these priors still represent plausible values. Most the lines are positive, due to the prior constraint. However, because of variation around the mean, some lines do show a decrease in height. If it is truly impossible for students to shrink, then data like this might arise from measurement error.
Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
Again, this is a subtle issue. All the advice from just above applies. But it’s not clear here whether “variance among heights for students of the same age is never more than 64cm” is a feature of the sample or of the population. If it’s of the population, then setting a prior for it may be okay. In that case:
\[
\sigma \sim \text{Uniform}(0,64)
\]
would make sense.
1.12 4M7
Refit model m4_3 from the notebook, but omit the mean weight xbar this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.
d = pd.read_csv("data/Howell1.csv", sep=';')d2 = d[d['age'] >=18]
m4_3_pred ='''data { int<lower=1> N; vector[N] height; vector[N] weight; real xbar; int<lower=1> N_tilde; vector[N_tilde] weight_tilde;}parameters { real a; real<lower=0> b; real<lower=0, upper=50> sigma;}model { vector[N] mu; mu = a + b * (weight - xbar); // Likelihood Function height ~ normal(mu, sigma); // Priors a ~ normal(178, 20); b ~ lognormal(0, 1); sigma ~ uniform(0, 50);}generated quantities { vector[N_tilde] y_tilde; for (i in 1:N_tilde) { y_tilde[i] = normal_rng(a + b * (weight_tilde[i] - xbar), sigma); }}'''weight_seq = np.arange(25, 71)data = {'N': len(d2), 'xbar': d2['weight'].mean(),'height': d2['height'].tolist(), 'weight': d2['weight'].tolist(),'N_tilde': len(weight_seq),'weight_tilde': weight_seq.tolist()}m4_3_model = utils.StanQuap('stan_models/m4_3_pred', m4_3_pred, data, algorithm ='Newton', generated_var=['y_tilde'])m4_3_model.precis().round(2)m4_3_model.vcov_matrix().round(3)
It can be seen that the posterior values of both models look rather similar to each other. However the in the variance-covariance matrix appear to be a couple of orders of magnitude larger when one doesn’t subtract the mean values from the data. Why is this? It’s because the parameters signify different things in the two different models.
Inspect the model again:
Originally we had
\[
\mu_i = \alpha + \beta(x_i - \bar{x})
\]
and in the second run we are instead deciding to fit
\[
\mu_i = \alpha' + \beta x_i
\]
We can see that doing this in no way changes the model that is being fit to the data as we can derive one from the other as follows
So as we can see, we can literally derive one parameter from the other. So why is the variance and covariance for the posterior of \(\alpha'\) so much larger than for \(\alpha\)? It’s because of what these parameters represent. For \(\alpha\), it can be seen that \(\mu = \alpha\) when \(x = \bar{x}\). That is \(\alpha\) represents the mean height at the mean weight. Therefore, the uncertainty in \(\alpha\) is equivalent to the uncertainty in \(\mu\) when \(x = \bar{x}\), the mean of the weight. On the other hand, \(\mu = \alpha'\) when \(x = 0\), that is \(\alpha\) represent the y-intercept of the data.
1.13 4M8
In the chapter, we used 15 knots with the cherry blossom spline. Increase the number of knots and observe what happens to the resulting spline. Then adjust also the width of the prior on the weights—change the standard deviation of the prior and watch what happens. What do you think the combination of knot number and the prior on the weights controls?
d = pd.read_csv("data/cherry_blossoms.csv", sep=';')d2 = d[d[['doy']].notna().all(axis=1)]m4_7a ='''data { int N; int K; array[N] int doy; matrix[N, K] B;}parameters { real a; vector[K] w; real<lower=0> sigma;}transformed parameters { vector[N] mu; mu = a + B * w;}model { doy ~ normal(mu, sigma); a ~ normal(100, 10); w ~ normal(0, 10); sigma ~ exponential(1);}'''m4_7a_mod ='''data { int N; int K; array[N] int doy; matrix[N, K] B;}parameters { real a; vector[K] w; real<lower=0> sigma;}transformed parameters { vector[N] mu; mu = a + B * w;}model { doy ~ normal(mu, sigma); a ~ normal(100, 10); w ~ normal(0, 30); sigma ~ exponential(1);}'''def bspline_regression_plots(data, num_knots, degree, model='m4_7a'): model_var =globals().get(model) knot_list = np.quantile(data.year, np.linspace(0, 1, num_knots))[1:-1].tolist() bspline_formula = Formula(f'bs(year, knots={knot_list}, degree={degree}, include_intercept=True)-1') B_mat = bspline_formula.get_model_matrix(data).to_numpy() input_data = {'N': B_mat.shape[0], 'K': B_mat.shape[1],'doy': data['doy'].astype(int).tolist(), 'B': B_mat.tolist()} model = utils.StanQuap(f'stan_models/{model}', model_var, input_data, algorithm ='Newton', generated_var=['mu']) post = model.extract_samples(n=1000, drop=[]) w = post['w'].mean(axis=0) fig, axs = plt.subplots(3, 1, figsize=(8, 8)) axs[0].plot(data.year, B_mat) axs[0].set(xlabel='Year', ylabel='Basis Value') axs[1].plot(data.year, B_mat * w) axs[1].set(xlabel='Year', ylabel='Basis * Weight') _, mu_plo, mu_phi = utils.precis(post['mu']) axs[2].plot(data.year, data.doy, 'o', color='pink', alpha=0.6, zorder=2) axs[2].fill_between(data.year, mu_plo, mu_phi, alpha=0.6, zorder=3) axs[2].set(xlabel='Year', ylabel='Day in Year') fig.tight_layout()
Increasing the number of knots increases the ‘wiggles’ and the increasing the standard deviation of the weight increases the width/thickness of \(\mu\)’s HDI plot.
1.14 4H1
The weights listed below were recorded in the !Kung census, but heights were not recorded for these individuals. Provide predicted heights and 89% intervals for each of these individuals. That is, fill in the table below, using model-based predictions.
Individual
Weight
Expected Height
89% Interval
1
46.95
2
43.72
3
64.78
4
32.59
5
54.63
The question is ambiguous as to whether it wants \(µ\) only or rather the distribution of individual height measurements (using \(σ\)). I’ll show the harder approach, that uses \(σ\) and simulates individual heights.
I’ll use quite flat priors here, but if you used stronger priors, that’s fine. What matters is the procedure be coherent.
Method 1: Using StanQuap convenience methods
d = pd.read_csv("data/Howell1.csv", sep=';')d2 = d[d['age'] >=18]m4_3d ='''data { int<lower=1> N; vector[N] height; vector[N] weight; int<lower=1> N_tilde; vector[N_tilde] weight_tilde;}parameters { real a; real b; real<lower=0, upper=50> sigma;}model { vector[N] mu; mu = a + b * weight; // Likelihood Function height ~ normal(mu, sigma); // Priors a ~ normal(100, 100); b ~ normal(0, 10); sigma ~ uniform(0, 50);}generated quantities { vector[N_tilde] y_tilde; for (i in 1:N_tilde) { y_tilde[i] = normal_rng(a + b * weight_tilde[i], sigma); }}'''missing_heights = np.array([46.95, 43.72, 64.78, 32.59, 54.63])data = {'N': len(d2), 'height': d2['height'].tolist(), 'weight': d2['weight'].tolist(),'N_tilde': len(missing_heights),'weight_tilde': missing_heights.tolist()}m4_3d_model = utils.StanQuap('stan_models/m4_3d', m4_3d, data, algorithm ='Newton', generated_var=['y_tilde'])sim = m4_3d_model.sim(data=data, n=10_000)['y_tilde']sim_mean = sim.mean(axis=0)sim_hdi = az.hdi(sim, hdi_prob=0.89)missing_data = pd.DataFrame({'Individual': np.arange(5)+1,'Weight': missing_heights,'Expected Height': sim_mean,'89% Interval': [f"{x[0]} - {x[1]}"for x in sim_hdi]})missing_data.set_index('Individual')
Weight
Expected Height
89% Interval
Individual
1
46.95
156.320006
148.058 - 164.309
2
43.72
153.433954
145.363 - 161.324
3
64.78
172.527412
164.582 - 181.329
4
32.59
143.378311
135.276 - 151.409
5
54.63
163.333577
154.817 - 171.227
Method 2: Using SciPy
We first produce samples from the quadratic approximate posterior. We could use the model’s means and variance-covariance matrix and pass it in scipy.stats.multivariate_normal directly, or the convenient extract_samples method:
post = m4_3d_model.extract_samples(n=10_000)
Now we want to plug the weights in the table into this model, and then average over the posterior to compute predictions for each individual’s height. The question is ambiguous as to whether it wants \(µ\) only or rather the distribution of individual height measurements (using \(σ\)). I’ll show the harder approach, that uses \(σ\) and simulates individual heights.
For the first individual, the code might look like this:
y = stats.norm.rvs(loc=post['a'] + post['b'] *46.95, scale=post['sigma'])y.mean()az.hdi(y, hdi_prob=0.89)
array([148.3268378 , 164.53844283])
How does the code work? The first line, which includes stats.norm.rvs, simulates 100-thousand heights, using the samples from the posterior and an assumed weight of 46.95 kg. The second line then computes the average of these simulated heights. That gives the expected (mean) height. The third line then computes the 90% HPDI of height.
Do the above for each row in the table, and you should get numbers close to the table produced earlier.
We could have also computed the expected heights straight from the MAP. That approach is fine, and will give nearly the same answer.
1.15 4H2
Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.
Fit a linear regression to these data, using StanQuap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Super-impose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.
d = pd.read_csv("data/Howell1.csv", sep=';')d2 = d[d['age'] <18]
Fit a linear regression to these data, using StanQuap. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
Now find the quadratic approximate posterior. The code is the same as before. The data frame just changes.
m4_3d ='''data { int<lower=1> N; vector[N] height; vector[N] weight; int<lower=1> N_tilde; vector[N_tilde] weight_tilde;}parameters { real a; real b; real<lower=0, upper=50> sigma;}model { vector[N] mu; mu = a + b * weight; // Likelihood Function height ~ normal(mu, sigma); // Priors a ~ normal(100, 100); b ~ normal(0, 10); sigma ~ uniform(0, 50);}generated quantities { vector[N_tilde] y_tilde; for (i in 1:N_tilde) { y_tilde[i] = normal_rng(a + b * weight_tilde[i], sigma); }}'''weight_seq = np.linspace(1,45,50)data = {'N': len(d2), 'height': d2['height'].tolist(), 'weight': d2['weight'].tolist(),'N_tilde': len(weight_seq),'weight_tilde': weight_seq.tolist()}m4_3d_model = utils.StanQuap('stan_models/m4_3d', m4_3d, data, algorithm ='Newton', generated_var=['y_tilde'])m4_3d_model.precis().round(2)
Mean
StDev
5.5%
94.5%
Parameter
a
58.24
1.40
56.01
60.47
b
2.72
0.07
2.61
2.83
sigma
8.44
0.43
7.75
9.13
The estimates suggest that the MAP coefficient for weight is 2.7. This implies that for a unit change of 1kg of weight, we predict an average of 2.7cm of increase in height.
Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Super-impose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
Now to plot the raw data and superimpose the model estimates, modify the code from earlier. We will sample from the naive posterior, then compute 90% intervals for the mean and predicted heights. This is what the complete code looks like, if you opt not to use the convenience methods link and sim:
We will sample from the posterior predictive, then compute 90% intervals for the mean and predicted heights.
What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.
The major problem with this model appears to be that the relationship between weight and height, for non-adults, isn’t very linear. Instead it is curved. As a result, at low weight values, the predicted mean is above most of the actual heights. At middle weight values, the predicted mean is below most of the heights. Then again at high weight values, the mean is above the heights.
A parabolic model would likely fit these data much better. But that’s not the only option. What we’re after essentially is some way to model a reduction of the slope between height and weight, as weight increases.
1.16 4H3
Suppose a colleague of yours, who works on allometry, glances at the practice problems just above. Your colleague exclaims, “That’s silly. Everyone knows that it’s only the logarithm of body weight that scales with height!” Let’s take your colleague’s advice and see what happens.
Model the relationship between height (cm) and the natural logarithm of weight (log-kg). Use the entire Howell1 data frame, all 544 rows, adults and non-adults. Can you interpret the resulting estimates?
Begin with this plot: plt.plot(d2.weight, d2.height). Then use samples from the quadratic approximate posterior of the model in (a) to superimpose on the plot: (1) the predicted mean height as a function of weight, (2) the 97% interval for the mean, and (3) the 97% interval for predicted heights.
d = pd.read_csv("data/Howell1.csv", sep=';')
Model the relationship between height (cm) and the natural logarithm of weight (log-kg). Use the entire Howell1 data frame, all 544 rows, adults and non-adults. Can you interpret the resulting estimates?
We can just use log inside the Stan mode:
mlw ='''data { int<lower=1> N; vector[N] height; vector[N] weight; int<lower=1> N_tilde; vector[N_tilde] weight_tilde;}parameters { real a; real b; real<lower=0, upper=50> sigma;}model { vector[N] mu; mu = a + b * log(weight); // Likelihood Function height ~ normal(mu, sigma); // Priors a ~ normal(138, 100); b ~ normal(0, 100); sigma ~ uniform(0, 50);}generated quantities { vector[N_tilde] y_tilde; for (i in 1:N_tilde) { y_tilde[i] = normal_rng(a + b * weight_tilde[i], sigma); }}'''lw_seq = np.linspace(1.4,4.2,50)data = {'N': len(d), 'height': d['height'].tolist(), 'weight': d['weight'].tolist(),'N_tilde': len(lw_seq),'weight_tilde': lw_seq.tolist()}mlw_model = utils.StanQuap('stan_models/mlw', mlw, data, algorithm ='Newton', generated_var=['y_tilde'])mlw_model.precis().round(2)
Mean
StDev
5.5%
94.5%
Parameter
a
-23.79
1.34
-25.92
-21.66
b
47.08
0.38
46.47
47.69
sigma
5.13
0.16
4.89
5.38
Pretty hard to know what to make of these estimates, aside from the fact that the confidence intervals are quite narrow, owing to there being 544 rows. The estimate for b (\(β\)) is hard to understand, because it refers to log-kg, not raw kg. It means that for every increase of 1 log-kg of weight, you expect a increase of 47 cm of height. But what’s a log-kg? You want to know what the model predicts on the natural scale of measurement. So now to plotting…
Begin with this plot: plt.plot(d2.weight, d2.height). Then use samples from the quadratic approximate posterior of the model in (a) to superimpose on the plot: (1) the predicted mean height as a function of weight, (2) the 97% interval for the mean, and (3) the 97% interval for predicted heights.
Begin by sampling from the naive posterior and computing the confidence intervals as per the examples in the book. Again, I’ll not use the convenience functions, but it’s fine if you did.
Now you have lists of numbers that can be plotted to produce the confidence curves. But how to translate back to the non-log scale of measurement on the horizontal axis? Just exponentiate the x-axis coordinates in lw_seq, and the job is done:
The model may have been linear, but plotted on the raw scale of measurement, it is clearly non-linear. Not only is the trend for the mean curved, but the variance around the mean is not constant, on this scale. Instead, the variance around the mean increases with weight. On the scale you fit the model on, the variance was assumed to be constant. But once you transform the measurement scale, it usually won’t be.
Notice also that the estimate for the mean is so precise that you can hardly even see the confidence interval for it. Don’t get too confident about such results, though. Remember, all inferences of the model are conditional on the model. Even estimated trends that do a terrible job of prediction can have tight confidence intervals, when the data set is large.
1.17 4H4
Plot the prior predictive distribution for the parabolic polynomial regression model in the notebook. You can modify the code that plots the linear regression prior predictive distribution. Can you modify the prior distributions of \(α\), \(β_1\), and \(β_2\) so that the prior predictions stay within the biologically reasonable outcome space? That is to say: Do not try to fit the data by hand. But do try to keep the curves consistent with what you know about height and weight, before seeing these exact data.
Modified priors and their justifications are written along side:
\(H_i \sim N(\mu_i,\sigma)\)
\(\mu_i = \alpha + \beta_1 (x_i - \overline{x}) - \beta_2(x_i - \overline{x})^2\) -> Note that it’s \(-\beta_2\). Positive prior on \(\beta\) ensures the paraboloid curves in the correct direction.
\(\alpha \sim N(150,30^2)\) -> Reduced mean height at mean weight, as we are looking at a much larger age range now, so we expect younger people to drag the mean height down at any given weight relative to the adult only data set
\(\beta_1 \sim \text{Lognormal}(0,1)\) -> Need a linear base line to perturb with a quadratic term
The seconds term puts a small deviation from linearity
\(\beta_2 \sim \text{Exponential}(0.05)\) -> This should be positive to ensure concavity, and small to ensure small only small perturbation from linearity in weights
By using an exponential prior on \(\beta_2\) this ensures the parabolic curve must be concave, thus no increasing growth in height with weight is possible and makes perfect sense on physical grounds. However, extreme and absurd values are still found near the extremes of the data. One could tighten the parameters to ensure this didn’t happen, but as a first approximation this isn’t too bad.
Clearly there is room for improvement here. However, because it’s not intuitive how exactly each parameter effects the parabolic curve, finding a good prior distribution is really hard! After much trial and error and playing with parabola calculators online, here is what I ended up with:
Return to cherry_blossoms and model the association between blossom date (doy) and March temperature (temp). Note that there are many missing values in both variables. You may consider a linear model, a polynomial, or a spline on temperature. How well does temperature trend predict the blossom trend?
d = pd.read_csv("data/cherry_blossoms.csv", sep=';')d2 = d.dropna(subset=["doy", "temp"]).reset_index(drop=True)d2['temp_s'] = utils.standardize(d2.temp, ddof=1)
We’ll try each type of model: linear, polynomial, and spline. For each, we’ll fit the model, and then visualize the predictions with the observed data.
By leveraging a matrix formulation (design matrix), we are setting up our StanQuap model to handle all four models described above.
m4_8 ='''data { int<lower=1> N; // Number of Observations int<lower=1> K; // Number of Coefficients (including intercept) vector[N] doy; // DoY Outcome matrix[N, K] X; // Regressors (March Temperature) int<lower=1> N_tilde; matrix[N_tilde, K] X_tilde;}parameters { vector[K] b; real<lower=0, upper=50> sigma;}transformed parameters { vector[N] mu; mu = X * b; vector[N_tilde] mu_tilde; mu_tilde = X_tilde * b;}model { doy ~ normal(mu, sigma); b[1] ~ normal(100, 10); b[2] ~ normal(0, 10); // Up to Cubic regression if (K > 2 && K < 5) { for (i in 3:K) { b[i] ~ normal(0, 1); } } // B-Spline if (K > 5) { for (i in 3:K) { b[i] ~ normal(0, 10); } } sigma ~ exponential(1);}generated quantities { vector[N_tilde] y_tilde; for (i in 1:N_tilde) { y_tilde[i] = normal_rng(mu_tilde[i], sigma); }}'''temp_seq = np.linspace(-2.2, 3.3, 100)def dmat(data, degree): poly_terms =" + ".join([f"I(temp_s**{i})"for i inrange(1, degree +1)]) formula = Formula(f"1 + {poly_terms}")return formula.get_model_matrix(data).to_numpy()def bspline_dmat(data, num_knots, degree=3): knot_list = np.quantile(data.temp_s, np.linspace(0, 1, num_knots))[1:-1].tolist()# Maintain first/intercept column bspline_formula = Formula(f'bs(temp_s, knots={knot_list}, degree={degree}, include_intercept=True)')return bspline_formula.get_model_matrix(data).to_numpy()def stan_data_dict(degree=2, num_knots=None): X = dmat(d2, degree) if num_knots isNoneelse bspline_dmat(d2, num_knots) temp_seq_df = pd.DataFrame({'temp_s': temp_seq}) X_tilde = dmat(temp_seq_df, degree) if num_knots isNoneelse bspline_dmat(temp_seq_df, num_knots)return {'N': X.shape[0],'K': X.shape[1],'doy': d2['doy'].tolist(),'X': X.tolist(),'N_tilde': len(temp_seq_df),'X_tilde': X_tilde.tolist()}title = ['Linear', 'Quadratic', 'Cubic', 'Spline']def plot_model(): model = utils.Stan('stan_models/m4_8', m4_8) _, axes = plt.subplots(2, 2, figsize=(9,4), sharex=True, sharey=True)for idx, ax inenumerate(axes.flatten()): data = stan_data_dict(degree=idx+1) if title[idx] !='Spline'else stan_data_dict(num_knots=30) post = model.laplace_sample(data=data, draws=10_000).stan_variables() mu_mean, mu_plo, mu_phi = utils.precis(post['mu_tilde']) _, sim_plo, sim_phi = utils.precis(post['y_tilde']) ax.plot(d2.temp_s, d2.doy, 'o', fillstyle='none', color='k', alpha=0.3) ax.plot(temp_seq, mu_mean, 'k') ax.fill_between(temp_seq, mu_plo, mu_phi, color='green', alpha=0.2) ax.fill_between(temp_seq, sim_plo, sim_phi, color='blue', alpha=0.2) ax.set(xlabel='March Temperature (Standardized)', ylabel='Day in Year') ax.set_title(title[idx]) plot_model()
Overall the predictions from each model are remarkably similar. Therefore, I would go with a linear model, as that is the simplest of the models.
1.19 4H6
Simulate the prior predictive distribution for the cherry blossom spline in the chapter. Adjust the prior on the weights and observe what happens. What do you think the prior on the weights is doing?
As a reminder, here is the cherry blossom spline model from the chapter:
We’ll also need to recreate the basis functions that that model uses and finally, we can generate data from the priors, and combine those parameters with the basis functions to get the prior predictive distributions.
d = pd.read_csv("data/cherry_blossoms.csv", sep=';')d2 = d.dropna(subset=["doy", "year"]).reset_index(drop=True)
m4_7_priors ='''data { int<lower=1> N; int<lower=1> K; matrix[N, K] X;}generated quantities { vector[K] b; b[1] = normal_rng(100, 10); for (i in 2:K) { b[i] = normal_rng(0, 10); } vector[N] mu_sim = X * b;}'''bspline_prior_predictive('m4_7_priors', 15, 3, title='Prior Predictive Distribution - Weight mu = 0, sigma = 10')
Now let’s tighten the prior on w to \(\text{Normal}(0,2)\), as we used for exercise 4M8. Now the lines are much less wiggly, which is consistent with what we found in the previous exercise, which used the observed data.
m4_7_priors1 ='''data { int<lower=1> N; int<lower=1> K; matrix[N, K] X;}generated quantities { vector[K] b; b[1] = normal_rng(100, 10); for (i in 2:K) { b[i] = normal_rng(0, 2); } vector[N] mu_sim = X * b;}'''bspline_prior_predictive('m4_7_priors1', 15, 3, title='Prior Predictive Distribution - Weight mu = 0, sigma = 2')
1.20 4H8
The cherry blossom spline in the chapter used an intercept α, but technically it doesn’t require one. The first basis functions could substitute for the intercept. Try refitting the cherry blossom spline without the intercept. What else about the model do you need to change to make this work?
This looks a lot like our original model, except the left hand side of the spline is pulled down. This is likely due to the prior on w. The prior is centered on 0, but that assumes an intercept is present (i.e., the curves of the spline average a deviation of 0 from the mean). However, without the intercept, the prior drags the line down to actual zero when the first basis function in non-zero.