CMV Automation

CMV Process for Large Time Series

When applying the field analysis methodology, there is often an extensive dataset available showing, e.g. time series data for a plant over the course of an entire year or longer. In order to apply the method, it is necessary to obtain reduce the CMVs to a subset for which the method can be applied, because simply calculating for every possible cloud motion vector pair is computationally intractable.

This notebook will demonstrate the process for downselecting a set of interesting CMVs according to the methodology discussed in the 2024 IEEE PVSC paper.

Initialization

First we need to import relevant packages.

[1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pvlib

from solarspatialtools import stats, spatial, cmv

The Data

For this example, we will work on four days of 10s resolution data from the HOPE Melpitz campaign, included in the sample data package. We’ll first read it in. Since the CMV routine is built to work on the clearsky index, we’ll calculate that while we’re at it. The plot shows a sample of the data for a single sensor over the four days.

[2]:
fn = 'data/hope_melpitz_10s.h5'
pos = pd.read_hdf(fn, mode="r", key="latlon")
pos_utm = spatial.latlon2utm(pos['lat'], pos['lon'])
ts_data = pd.read_hdf(fn, mode="r", key="data")

loc = pvlib.location.Location(np.mean(pos['lat']), np.mean(pos['lon']))
cs_ghi = loc.get_clearsky(ts_data.index, model='simplified_solis')['ghi']
kt = ts_data.divide(cs_ghi, axis=0).clip(0,2)

# Plot a single sensor's data
plt.plot(ts_data.index, ts_data[40])
plt.xticks(rotation=90)
plt.xlabel('Time')
plt.ylabel('Irradiance')
plt.show()
../_images/demos_automate_cmv_demo_3_0.png

Deciding which Periods to Calculate CMVs For

I typically calculate CMVs using a one hour period of data. While that’s viable for the whole four days here, it would rapidly become ornerous over e.g. an entire year, due to the computational intensity of the CMV method. One method to reduce the number of CMVs to calculate is to only calculate them for periods where the irradiance is changing rapidly, indicating the presence of clouds. Here we will use the Variability Score to quantify how variable each one hour period is in the data.

Note that the variability score calculation has a slightly weird form using lambda functions, because doing so can allow the calculation to be performed using the vectorized form of the code. Other techniques might be possible.

[9]:
avg_interval = '1h'

# Calc the VS for each 1 hour period
vs_all = ts_data.resample(avg_interval).apply(
                lambda x: stats.variability_score(x[ts_data.columns]))

fig, ax = plt.subplots()
ax2 = ax.twinx()
ax.plot(ts_data.index, ts_data[40])
plt.xlabel('Time')
plt.xticks(rotation=90)
ax.set_ylabel('Irradiance', color='blue')
ax.tick_params(axis='y', colors='blue')
ax2.plot(vs_all.index, vs_all[40], 'r')
ax2.set_ylabel('Variability Score', color='red')
ax2.tick_params(axis='y', colors='red')
plt.show()
C:\Users\jrana\AppData\Local\Temp\ipykernel_8116\1689254596.py:5: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  lambda x: stats.variability_score(x[ts_data.columns]))
../_images/demos_automate_cmv_demo_5_1.png

Once we have a metric of variability, we can downselect to retain only the N most variable periods. Here we’ll use 20 just as an example.

[4]:
n_var = 20
vs = vs_all.median(axis=1).sort_values(ascending=False)
vs = vs.iloc[0:n_var]
print(vs)
2013-09-08 11:00:00+00:00    36.827512
2013-09-10 10:00:00+00:00    17.346362
2013-09-09 10:00:00+00:00    16.861423
2013-09-08 10:00:00+00:00    14.032438
2013-09-10 08:00:00+00:00    11.659271
2013-09-10 12:00:00+00:00     7.277602
2013-09-08 07:00:00+00:00     4.998409
2013-09-10 14:00:00+00:00     3.878509
2013-09-08 06:00:00+00:00     3.846806
2013-09-08 12:00:00+00:00     3.722167
2013-09-09 12:00:00+00:00     3.016234
2013-09-09 16:00:00+00:00     2.515403
2013-09-10 16:00:00+00:00     2.369758
2013-09-11 12:00:00+00:00     2.250612
2013-09-09 11:00:00+00:00     2.169986
2013-09-11 08:00:00+00:00     1.778225
2013-09-09 13:00:00+00:00     1.750230
2013-09-09 09:00:00+00:00     1.719327
2013-09-10 13:00:00+00:00     1.264782
2013-09-08 09:00:00+00:00     1.185312
dtype: float64

Calculating the CMVs

Now that we have a set of periods during which to calculate the CMV, we can loop over them to do the calculations. We also want to record some of the useful statistical and quality control information about the quality of the CMVs, because we’d later like to be able to determine which of these are of high quailty.

We will assess the CMVs based on the following values: - ngood - the number of point pairs that passed the Jamaly QC routines - r_corr - the correlation coefficient of separation vs. delay for the good point pairs (see cmv_demo.ipynb).
- flag - the overall flag from the CMV QC process
[5]:
cmvs = pd.DataFrame(columns=["cld_spd", "cld_dir_rad", "df_p95", "ngood", "r_corr", "stderr_corr", "error_index", "flag"])
for date in vs.index:
    # Select the subset of data
    hour = pd.date_range(date, date + pd.to_timedelta('1h'), freq='10s')
    kt_hour = kt.loc[hour]

    hourlymax = ts_data.loc[hour].max().quantile(0.95)

    # Compute the CMV using the Jamaly method
    cld_spd, cld_dir, dat = cmv.compute_cmv(kt_hour, pos_utm, method='jamaly', options={'minvelocity': 1})

    cmvs.loc[date] = [cld_spd, cld_dir, hourlymax, dat.method_data['ngood'], dat.method_data['r_corr'], dat.method_data['stderr_corr'], np.abs(dat.method_data["error_index"]), dat.flag.name]
pd.options.display.width = 800
print(cmvs)
                             cld_spd  cld_dir_rad       df_p95  ngood    r_corr  stderr_corr  error_index  flag
2013-09-08 11:00:00+00:00  17.253624     1.015742  1036.323712    306  0.990169     0.147371     0.102980  GOOD
2013-09-10 10:00:00+00:00   9.323374     0.446288   967.214252    624  0.970793     0.090337     0.128893  GOOD
2013-09-09 10:00:00+00:00  12.315270     5.789374  1105.664569    467  0.951945     0.197357     0.175704  GOOD
2013-09-08 10:00:00+00:00  15.790708     1.022334  1070.301172    781  0.872326     0.190603     0.202564  GOOD
2013-09-10 08:00:00+00:00  13.385675     0.736173   752.929056    581  0.989232     0.081722     0.082928  GOOD
2013-09-10 12:00:00+00:00  27.468773     0.515227   839.723816    412  0.978562     0.275288     0.122317  GOOD
2013-09-08 07:00:00+00:00  18.609486     0.871618   642.030344    797  0.992215     0.084133     0.081098  GOOD
2013-09-10 14:00:00+00:00  13.931439     0.698200   657.489758    756  0.943971     0.168528     0.195472  GOOD
2013-09-08 06:00:00+00:00  21.260689     0.833676   351.352144    680  0.993707     0.088246     0.086116  GOOD
2013-09-08 12:00:00+00:00  21.164760     0.558943   916.882816     90  0.873551     1.707730     0.241034  GOOD
2013-09-09 12:00:00+00:00   8.103066     5.511727   896.326129    623  0.966734     0.078114     0.110935  GOOD
2013-09-09 16:00:00+00:00  17.431592     0.077976   215.279455    491  0.807696     0.364491     0.137972  GOOD
2013-09-10 16:00:00+00:00  24.613327     0.378690   294.552469    578  0.960366     0.265352     0.146015  GOOD
2013-09-11 12:00:00+00:00  13.008833     0.057238   447.995486    772  0.983205     0.081499     0.100318  GOOD
2013-09-09 11:00:00+00:00  11.170620     0.140915   981.699747    933  0.919513     0.143933     0.240121  GOOD
2013-09-11 08:00:00+00:00  15.618439     1.117748   398.662215    760  0.825436     0.287277     0.173498  GOOD
2013-09-09 13:00:00+00:00   6.956777     5.610811   801.185779    860  0.879255     0.119534     0.306667  GOOD
2013-09-09 09:00:00+00:00  14.288631     5.987103  1088.490857     75  0.823863     1.168389     0.201482  GOOD
2013-09-10 13:00:00+00:00  31.678545     1.045541   675.928391    456  0.804433     0.710315     0.161430  GOOD
2013-09-08 09:00:00+00:00  18.401612     1.539226  1044.880005    689  0.970706     0.161717     0.092783  GOOD

Downselect to High Quality CMVs

We can now downselect to only the highest quality CMVs. The limits shown here are just for reference based off this dataset. They probably need to be tuned to each specific dataset to appropriately limit the CMVs. The printout shows the ones that failed the QC.

[6]:
ngood_min = 200
rval_min = 0.85
bad_inds = []
for row in cmvs.itertuples():
    if row.ngood < ngood_min:
        bad_inds.append(row.Index)
    elif row.r_corr < rval_min:
        bad_inds.append(row.Index)
    elif row.flag != 'GOOD':
        bad_inds.append(row.Index)
print(cmvs.loc[bad_inds])

cmvs = cmvs.drop(index=bad_inds)
                             cld_spd  cld_dir_rad       df_p95  ngood    r_corr  stderr_corr  error_index  flag
2013-09-08 12:00:00+00:00  21.164760     0.558943   916.882816     90  0.873551     1.707730     0.241034  GOOD
2013-09-09 16:00:00+00:00  17.431592     0.077976   215.279455    491  0.807696     0.364491     0.137972  GOOD
2013-09-11 08:00:00+00:00  15.618439     1.117748   398.662215    760  0.825436     0.287277     0.173498  GOOD
2013-09-09 09:00:00+00:00  14.288631     5.987103  1088.490857     75  0.823863     1.168389     0.201482  GOOD
2013-09-10 13:00:00+00:00  31.678545     1.045541   675.928391    456  0.804433     0.710315     0.161430  GOOD

Decide which CMVs make a representative set for Field Analysis

The field analysis requires vectors that are roughly perpendicular. So we will try to downselect to a set of CMVs that will give us a representative scatter across all directions. Since parallel and anti-parallel vectors are both disadvantageous here, we will rotate them all by 180 degrees. The function cmv.optimum_subset is capable of performing this optimization. For a real dataset, we might want to choose to keep around 10, but here we’ll use 5 just to show the downselection process.

[7]:
nfinal = 5
# Compute the x and y components of the CMVs for optimum_subset
vx, vy = spatial.pol2rect(cmvs.cld_spd, cmvs.cld_dir_rad)
indices = cmv.optimum_subset(vx, vy, n=nfinal)
print(cmvs.iloc[indices])
                             cld_spd  cld_dir_rad       df_p95  ngood    r_corr  stderr_corr  error_index  flag
2013-09-10 16:00:00+00:00  24.613327     0.378690   294.552469    578  0.960366     0.265352     0.146015  GOOD
2013-09-08 07:00:00+00:00  18.609486     0.871618   642.030344    797  0.992215     0.084133     0.081098  GOOD
2013-09-08 09:00:00+00:00  18.401612     1.539226  1044.880005    689  0.970706     0.161717     0.092783  GOOD
2013-09-09 12:00:00+00:00   8.103066     5.511727   896.326129    623  0.966734     0.078114     0.110935  GOOD
2013-09-09 10:00:00+00:00  12.315270     5.789374  1105.664569    467  0.951945     0.197357     0.175704  GOOD

Visualizing the result

Finally, we have a set of CMVs that are of high quality and represent a broad range of directions so that we are likely to have a good sampling of nearly perpendicular vectors that can be of use for the field analysis methodology. We’ll plot the combination of vectors that passed QC and those that were selected for field analysis just to visualize that process.

[8]:
# Plot the vectors visually
plt.figure()
for i, (dx, dy) in enumerate(zip(vx, vy)):
    mylabel = 'All CMVs' if i == 0 else '_nolegend_'
    plt.arrow(0, 0, dx, dy, head_width=1, head_length=1, fc='k', label=mylabel)
for i, (dx, dy) in enumerate(zip(vx.iloc[indices], vy.iloc[indices])):
    mylabel = 'Selected CMVs' if i == 0 else '_nolegend_'
    plt.arrow(0, 0, dx, dy, head_width=1, head_length=1, fc='r', ec='r', label=mylabel)
plt.xlabel('Eastward Velocity (m/s)')
plt.ylabel('Northward Velocity (m/s)')
plt.axis('equal')
plt.legend()
plt.tight_layout()
plt.show()
../_images/demos_automate_cmv_demo_15_0.png