{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Long-Term Correction with MCP\n\nThis example shows how to perform a Measure-Correlate-Predict (MCP)\nlong-term correction using :class:`~windkit.ltc.LinRegMCP` and\n:class:`~windkit.ltc.VarRatMCP`.\n\nThe workflow is:\n\n1. A long-term (LT) reference station provides many years of wind data.\n2. Short-term (ST) target on-site measurements that overlap the reference data (concurrent period).\n3. An MCP model is fitted on the concurrent period, mapping reference wind speeds to site wind speeds sector by sector.\n4. The fitted model is applied to the full long-term reference to produce a long-term corrected site wind climate.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Generate Synthetic Data\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport windkit as wk\nfrom windkit.ltc import LinRegMCP, VarRatMCP, calc_scores\nfrom windkit.spatial import create_point\n\nout_locs = create_point(500000, 6200000, 80, 32632)\n\nperiod_lt = pd.date_range(\"2004-01-01\", \"2009-01-01\", freq=\"1h\", inclusive=\"left\")\nperiod_st = pd.date_range(\"2008-01-01\", \"2009-01-01\", freq=\"1h\", inclusive=\"left\")\n\n# Generation of synthetic data for a correlated pair, with same direction bias.\ntgt_lt, ref_lt = wk.create_tswc_pair(\n    out_locs,\n    date_range=period_lt,\n    weibull_A=(6.0, 8.0),\n    weibull_k=(1.6, 2.2),\n    target_r2=0.8,\n    direction_bias=0,\n    speed_tau=14400.0,  # set the e-folding timescale for the wind speed\n)\nref_st, tgt_st = ref_lt.sel(time=period_st), tgt_lt.sel(time=period_st)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Concurrent Period Time Series\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, (ax_ws, ax_wd) = plt.subplots(2, 1, figsize=(12, 6), sharex=True)\nfor ds, color, label in [(ref_st, \"C0\", \"Ref\"), (tgt_st, \"k\", \"Site\")]:\n    sl = ds.sel(time=\"2008-01\")\n    sl.wind_speed.plot.line(x=\"time\", ax=ax_ws, color=color, label=label)\n    sl.wind_direction.plot.line(x=\"time\", ax=ax_wd, color=color, label=label)\nax_ws.set(\n    title=\"January 2008 \u2014 concurrent period\", xlabel=\"\", ylabel=\"Wind speed (m/s)\"\n)\nax_wd.set(title=\"\", xlabel=\"Time\", ylabel=\"Wind direction (\u00b0)\")\nfor ax in (ax_ws, ax_wd):\n    ax.legend()\n    ax.grid(True, linestyle=\"--\", alpha=0.5)\nfig.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Wind Speed Distributions and Correlation\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "bins_ws = np.linspace(0.0, 30.0, 31)\n\nfig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))\nkw = dict(bins=bins_ws, density=True, alpha=0.5)\nax1.hist(\n    ref_lt.wind_speed.values.flatten(), **kw, color=\"C0\", label=\"Ref (5 yr long-term)\"\n)\nax1.hist(\n    tgt_st.wind_speed.values.flatten(), **kw, color=\"k\", label=\"Site (1 yr observed)\"\n)\nax1.set(\n    xlabel=\"Wind speed (m/s)\",\n    ylabel=\"Density\",\n    title=\"Wind speed distributions\",\n    xlim=(0, 20),\n    ylim=(0, 0.2),\n)\nax1.legend()\nax1.grid(True, linestyle=\"--\", alpha=0.5)\n\nws_max = max(ref_st.wind_speed.values.max(), tgt_st.wind_speed.values.max())\nax2.scatter(\n    ref_st.wind_speed.values.flatten(),\n    tgt_st.wind_speed.values.flatten(),\n    alpha=0.5,\n    color=\"C0\",\n    s=10,\n)\nax2.plot([0, ws_max], [0, ws_max], \"k--\", linewidth=1, label=\"1:1 line\")\nax2.set(\n    xlabel=\"Reference wind speed (m/s)\",\n    ylabel=\"Site wind speed (m/s)\",\n    title=\"Concurrent period correlation\",\n)\nax2.legend()\nax2.grid(True, linestyle=\"--\", alpha=0.5)\nfig.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Fit MCP Models\nBoth :class:`~windkit.ltc.LinRegMCP` (ordinary least squares) and\n:class:`~windkit.ltc.VarRatMCP` (variance ratio regression) fit one\nindependent regression model per wind direction sector.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "linreg = LinRegMCP(n_sectors=12)\nlinreg.fit(ref_st, tgt_st)\n\nvarrat = VarRatMCP(n_sectors=12)\nvarrat.fit(ref_st, tgt_st)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Predict Long-Term Wind Climate\nApply the fitted models to the full 5-year reference to produce a\nlong-term corrected time series at the site location.\n\n:meth:`~windkit.ltc.LinRegMCP.predict` returns the deterministic\nregression-line prediction (conditional mean).\n:meth:`~windkit.ltc.LinRegMCP.predict_with_noise` adds Gaussian noise\nsampled from the per-sector residual standard deviation, recovering\nrealistic variance in the predicted distribution.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pred_lt_linreg = linreg.predict(ref_lt)\npred_lt_linreg_noisy = linreg.predict_with_noise(ref_lt, seed=42)\npred_lt_varrat = varrat.predict(ref_lt)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluate Model Performance\nUse :func:`~windkit.ltc.calc_scores` to evaluate how well each model\nreconstructs the site wind speeds over the concurrent period.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "pred_st_linreg = linreg.predict(ref_st)\npred_st_linreg_noisy = linreg.predict_with_noise(ref_st, seed=42)\npred_st_varrat = varrat.predict(ref_st)\n\nscores_baseline = calc_scores(tgt_st, ref_st, name=\"Baseline\", period=\"Concurrent\")\nscores_linreg = calc_scores(tgt_st, pred_st_linreg, name=\"LinReg\", period=\"Concurrent\")\nscores_linreg_noisy = calc_scores(\n    tgt_st, pred_st_linreg_noisy, name=\"LinReg+noise\", period=\"Concurrent\"\n)\nscores_varrat = calc_scores(tgt_st, pred_st_varrat, name=\"VarRat\", period=\"Concurrent\")\n\nprint(\"BASELINE\\n\", scores_baseline.to_string(index=False), \"\\n\")\nprint(\"LINREG (deterministic)\\n\", scores_linreg.to_string(index=False), \"\\n\")\nprint(\"LINREG (with noise)\\n\", scores_linreg_noisy.to_string(index=False), \"\\n\")\nprint(\"VARRAT\\n\", scores_varrat.to_string(index=False), \"\\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compare Long-Term Distributions\nThe deterministic LinReg prediction lies on the regression line and\ncompresses variance \u2014 the distribution is too narrow, clipping the\nhigh-wind tail.  Adding noise via\n:meth:`~windkit.ltc.LinRegMCP.predict_with_noise` recovers the spread\nby sampling from the per-sector residual distribution.\n:class:`~windkit.ltc.VarRatMCP` recovers variance by construction\ninstead: its per-sector slope is set to ``std(y) / std(x)``, so the\npredicted standard deviation matches the target directly without any\nstochastic sampling.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "ws_true = tgt_lt.wind_speed.values.flatten()\nws_ref = ref_lt.wind_speed.values.flatten()\nws_linreg = pred_lt_linreg.wind_speed.values.flatten()\nws_linreg_noisy = pred_lt_linreg_noisy.wind_speed.values.flatten()\nws_varrat = pred_lt_varrat.wind_speed.values.flatten()\n\nkw_true = dict(\n    bins=bins_ws, density=True, color=\"k\", alpha=0.5, label=\"Site true (5 yr)\"\n)\nfig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 4), sharey=True)\n\nax1.hist(ws_true, **kw_true)\nax1.hist(\n    ws_ref,\n    bins=bins_ws,\n    density=True,\n    facecolor=\"none\",\n    edgecolor=\"C0\",\n    lw=1.0,\n    alpha=0.8,\n    label=\"Ref\",\n)\nax1.hist(ws_linreg, bins=bins_ws, density=True, color=\"C1\", alpha=0.5, label=\"LinReg\")\nax1.set(\n    xlabel=\"Wind speed (m/s)\",\n    ylabel=\"Density\",\n    title=\"LinReg (deterministic)\",\n    xlim=(0, 20),\n)\nax1.legend()\nax1.grid(True, linestyle=\"--\", alpha=0.5)\n\nax2.hist(ws_true, **kw_true)\nax2.hist(\n    ws_ref,\n    bins=bins_ws,\n    density=True,\n    facecolor=\"none\",\n    edgecolor=\"C0\",\n    lw=1.0,\n    alpha=0.8,\n    label=\"Ref\",\n)\nax2.hist(\n    ws_linreg_noisy,\n    bins=bins_ws,\n    density=True,\n    color=\"C1\",\n    alpha=0.5,\n    label=\"LinReg + noise\",\n)\nax2.set(xlabel=\"Wind speed (m/s)\", title=\"LinReg (with noise)\", xlim=(0, 20))\nax2.legend()\nax2.grid(True, linestyle=\"--\", alpha=0.5)\n\nax3.hist(ws_true, **kw_true)\nax3.hist(\n    ws_ref,\n    bins=bins_ws,\n    density=True,\n    facecolor=\"none\",\n    edgecolor=\"C0\",\n    lw=1.0,\n    alpha=0.8,\n    label=\"Ref\",\n)\nax3.hist(ws_varrat, bins=bins_ws, density=True, color=\"C2\", alpha=0.5, label=\"VarRat\")\nax3.set(xlabel=\"Wind speed (m/s)\", title=\"VarRat\", xlim=(0, 20))\nax3.legend()\nax3.grid(True, linestyle=\"--\", alpha=0.5)\n\nfig.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Variance Recovery\nThe standard deviation of the predicted wind speeds shows how\ndeterministic LinReg compresses variance, while ``predict_with_noise``\nand VarRat both recover the spread.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(f\"Site true std:           {np.std(ws_true):.3f} m/s\")\nprint(f\"LinReg (deterministic):  {np.std(ws_linreg):.3f} m/s\")\nprint(f\"LinReg (with noise):     {np.std(ws_linreg_noisy):.3f} m/s\")\nprint(f\"VarRat:                  {np.std(ws_varrat):.3f} m/s\")"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.14.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}