Cloud Motion Vector Calculations

The Cloud Motion Vector (CMV) package contained in solarspatialtools.cmv is a package that implements cloud motion vector calculation methodologies from the literature. These methods are based on analysis of a distributed field of sensors.

This demo utilizes sample data from a plant to demonstrate the application of the method.

[1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from solarspatialtools import cmv, spatial

Read the Data

The sample data file is a compressed H5 file that contains all the info used to perform the analysis.

The Plant Layout

The layout of the plant is placed in the variable pos_utm. Its definition is contained in the field latlon, which produces a DataFrame of the combiner centroids in a UTM-like coordinate system. It contains combiner IDs as the index, and columns E and N that indicate the mean position of the combiner footprint (i.e. the centroid). The combiners are named following the pattern CMB-<INVERTER#>-<COMBINER#>.

The Time Series Data

The method relies on time series data from a plant. Two examples are stored in the data file with keys data_a and data_b. These each represent one hour of data for each combiner in the plant with a 10s sampling resolution. The DataFrames are indexed by a time that is artificially offset to begin at a time of 00:00:00 on an arbitrary day. The columns of the DataFrames have keys that match the index of pos_utm. The two time periods are known to have a high variability and thus be well suited to the CMV calculation.

[2]:

datafile = "data/sample_plant_2.h5"
pos_utm = pd.read_hdf(datafile, mode="r", key="utm")
ts_data = pd.read_hdf(datafile, mode="r", key="data_a")

ts_data

[2]:

	CMB-001-1	CMB-001-2	CMB-001-3	CMB-001-4	CMB-001-5	CMB-001-6	CMB-001-7	CMB-002-1	CMB-002-2	CMB-002-3	...	CMB-047-7	CMB-047-8	CMB-048-1	CMB-048-2	CMB-048-3	CMB-048-4	CMB-048-5	CMB-048-6	CMB-048-7	CMB-048-8
2023-01-01 00:00:00	93.799402	93.985920	93.622077	94.318063	93.390276	93.459779	93.753268	95.253145	95.510679	95.718582	...	92.871630	92.798612	91.074627	91.385004	91.554207	91.528461	91.404796	91.747153	91.461591	91.606280
2023-01-01 00:00:10	94.508405	94.819152	95.028923	95.105311	93.855248	94.410892	94.540710	95.633790	95.760610	95.699990	...	92.753680	92.873880	91.292149	91.850400	91.644421	91.739311	91.842573	91.653509	91.475838	91.425386
2023-01-01 00:00:20	94.767218	95.245977	95.438032	95.826806	94.000549	94.737096	94.925867	95.603580	95.745574	95.265605	...	92.933072	92.732030	91.251834	91.799432	91.926831	91.726960	91.657963	91.725119	91.453577	91.490994
2023-01-01 00:00:30	94.880949	95.406670	95.139340	95.771861	94.032838	94.730620	95.055108	95.470650	95.914626	96.075220	...	92.942349	92.877739	91.443098	92.072665	91.937813	91.629913	91.534880	91.854972	91.431316	91.431947
2023-01-01 00:00:40	94.837813	95.538455	95.515880	95.826806	94.407401	94.698243	95.155677	95.337730	95.736634	95.525902	...	92.886374	92.751329	91.496539	92.014166	91.867211	91.532872	91.664974	91.688921	91.409054	91.577226
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2023-01-01 00:59:20	14.373001	13.225903	14.791491	13.385478	13.085343	13.466051	14.118774	13.721180	13.306969	8.306490	...	14.580523	15.408295	14.735548	15.268503	15.645326	17.442542	17.913292	17.870185	16.114405	16.653684
2023-01-01 00:59:30	14.583977	13.338135	14.856895	13.349449	13.317424	13.456741	14.205222	13.811406	13.327289	8.690166	...	14.611648	15.615766	14.689607	14.798921	14.540775	16.176576	18.011441	18.704340	17.892674	19.351115
2023-01-01 00:59:40	14.610643	13.486927	14.896615	13.516509	13.312177	13.673675	14.273695	13.934259	13.347608	8.382548	...	14.247965	15.191175	14.563032	14.489767	14.585489	15.712535	15.359870	15.863494	18.463466	23.291351
2023-01-01 00:59:50	14.594957	13.737747	15.056860	13.734342	13.508587	13.867665	14.499656	13.958829	13.495051	8.701196	...	14.179527	14.860188	14.610850	14.564967	14.809852	15.814871	15.672231	16.217617	15.594370	17.694980
2023-01-01 01:00:00	14.933424	13.887485	15.247237	13.952175	13.770243	14.039324	14.641018	14.113163	13.674277	9.230496	...	14.203536	14.929559	14.533342	14.613296	14.461280	15.638823	15.624369	16.330934	15.905045	17.639993

361 rows × 382 columns

Computing the CMV

The CMV calculation is performed by the compute_cmv function. The two methods used are the method of Gagne et al. and of Jamaly and Kleissl. Both methods involve computing the relative delays experienced throughout a field of sensor measurement points, as computed by the lagging cross correlation between the clearsky index time series of the sensors. These delays are induced by cloud motion over the plant, with downwind sensors experiencing a response to the time signal that is later than that experienced by upwind sensors. The differences between the two methods lie in the ways that they perform the analysis of the delay data in computing the CMV. Some differences also exist in the implementation of data quality control.

Gagne et al. Method

The method described by Gagne relies on a least squares regression between the spatial separations of the points and their relative delays to find the x and y components of the CMV vector. Quality control is used over the entire plant to select the sensors with the strongest mutual cross correlation, while still retaining a sufficient number of sensors for analysis. The full method is described in the paper:

1. Gagné, N. Ninad, J. Adeyemo, D. Turcotte, and S. Wong, “Directional Solar Variability Analysis,” in 2018 IEEE Electrical Power and Energy Conference (EPEC) (2018) pp. 1–6, iSSN: 2381-2842. https://www.researchgate.net/publication/330877949_Directional_Solar_Variability_Analysis

Jamaly and Kleissl Method

The method of Jamaly and Kleissl determines the CMV by finding the direction that minimizes the variance in the velocities computed using the delays and positions of the sensors. The magnitude of the vector is determined to be the median of these individual sensor velocities. A variety of statistical quality controls is applied both on a pairwise basis for individual pairs of sensors, and on an aggregate basis.

1. Jamaly and J. Kleissl, “Robust cloud motion estimation by spatio-temporal correlation analysis of irradiance data,” Solar Energy, vol. 159, pp. 306–317, Jan. 2018. https://www.sciencedirect.com/science/article/pii/S0038092X17309556

Implementation Details

Inputs and Reference Points

Both functions require the time series and positions within the field. The routine will read the site identifiers from the index of the positions DataFrame and will generate all possible combinations of pairings. These pairings will be used to perform analysis on columns of the timeseries DataFrame, so all index values in positions must be present.

The optional reference_id parameter can contin a single reference identifier, or a list thereof, and can be used to limit the number of points used for the analysis. That is, rather than seeking all possible combinations of points, only pairwise combinations containing the members of reference_id will be considered. This can be particularly useful for time-sensitive applications.

Data Conditioning

The published methods both assume that timeseries contains values of the clearsky index. In principle, however, any signals can be used subject to the caveat that oddities may be experienced due to mismatch with the value ranges expected by the QC. In the case of the example data from the plant combiners being demonstrated here, no clearsky index is available and no practical means exists to calculate one (i.e. what is the “clearsky” irradiance for a combiner?). Normalizing values to have a range similar to clearsky index is possible by computing for example the 95th quantile of the time series. The value of corr_scaling specified would also be likely to impact the ability of the QC to identify good data, as several QC criteria rely on the absolute value of the correlation. The default expected values of correlation correspond to the correltion coefficient scaling produced by 'coeff'. Several customization options for the individual method QC are available via the options parameter, and are described in the docstring of the function. Notably, setting minvelocity: 1 can be used by the jamaly method to limit unrealistically high values of the lag.

Outputs

The function returns the speed and direction (in radians clockwise from east) of the CMV, as well as a WindspeedData object containing intermediate data from the calculation. Some interesting values include the list of pairings, individual correlation, lag and separation values for those pairings and the flags for each pair indicating the quality of the data. Definitions of the various flags are found in solarspatialtools.cmv.Flag. The field method_data also includes some specific data outputs produced by the steps of the individual methods, or as part of their QC approaches.

[3]:

hourlymax = np.mean(ts_data.quantile(0.95))
kt = ts_data / hourlymax

cld_spd_gag, cld_dir_gag, dat_gag = cmv.compute_cmv(kt, pos_utm,
                                                    reference_id=None,
                                                    method='gagne')
cld_spd_jam, cld_dir_jam, dat_jam = cmv.compute_cmv(kt, pos_utm,
                                                    reference_id=None,
                                                    method='jamaly')

print("Method     Speed  Angle:rad  Angle:°   N_good")
print(f"Gagne   {cld_spd_gag:8.2f} {cld_dir_gag:10.2f} {np.rad2deg(cld_dir_gag):8.2f} {sum(dat_gag.pair_flag == cmv.Flag.GOOD):8,}")
print(f"Jamaly  {cld_spd_jam:8.2f} {cld_dir_jam:10.2f} {np.rad2deg(cld_dir_jam):8.2f} {sum(dat_jam.pair_flag == cmv.Flag.GOOD):8,}")

Method     Speed  Angle:rad  Angle:°   N_good
Gagne      11.47       0.20    11.20    1,861
Jamaly      9.52       0.62    35.50   22,383

Inspecting the Results

Some instructive discussion can be motivated by the detailed results.

Visualizing the Velocity Calculation

For both methods, the velocity is basically computed based on the aggregate \(\Delta x / \Delta t\) across the entire plant. In the figure below, we show the plot of \(\Delta x\) vs \(\Delta t\) for all point pairs in the plant. The best fit line, whose slope indicates the velocity, is shown in red. The colorization indicates the value of the peak correlation, from which the lag was extracted. Higher correlation values indicate a better relationship between that particular pair of points.

The repeated display of the figure shows magenta highlight on the points that were flagged passing the method’s QC. These GOOD points were the only ones that were used in computing the best fit, which is also evident from the large group of points with unrealistically high lag on the right hand side of the figure. Points with a delay value of exactly zero were always excluded from the fit. It’s interesting to note that some degree of digitization exists withiin the results due to the fact that only integer time steps are calculated when identifying the lag with maximum cross correlation.

[4]:

fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))
ax0.set_title('Result Data for Jamaly Method')
s = ax0.scatter(dat_jam.pair_lag, dat_jam.pair_dists, c=dat_jam.corr_lag, vmin=0, vmax=1, s=1)
ax0.plot([-150, 200], cld_spd_jam * np.array([-150, 200]), 'r--', linewidth=2)  # Best Fit
ax0.set_ylim([-1500, 1000])
ax0.set_xlim([-150, 900])
fig.colorbar(s).set_label("Peak Correlation (-)")
ax0.set_xlabel('Time Lag Between Signals (s)')
ax0.set_ylabel('Windward Separation Distance (m)')
ax0.legend(['Point Correlation', 'Speed Fit'])

ax1.set_title('Highlighted GOOD Points for Jamaly Method')
s = ax1.scatter(dat_jam.pair_lag, dat_jam.pair_dists, c=dat_jam.corr_lag, vmin=0, vmax=1, s=1)
ax1.plot([-150, 200], cld_spd_jam * np.array([-150, 200]), 'r--', linewidth=2)  # Best Fit
ax1.plot(dat_jam.pair_lag[dat_jam.pair_flag == cmv.Flag.GOOD],
         dat_jam.pair_dists[dat_jam.pair_flag == cmv.Flag.GOOD], 'mo', markersize=1)
ax1.set_ylim([-1500, 1000])
ax1.set_xlim([-150, 900])
fig.colorbar(s).set_label("Peak Correlation (-)")
ax1.set_xlabel('Time Lag Between Signals (s)')
ax1.set_ylabel('Windward Separation Distance (m)')
ax1.legend(['Point Correlation', 'GOOD Points', 'Speed Fit']);

Visualizing the Lag Across the Field

A second interesting plot is that of the delay across the field. In order to visualize this case, we need to select a reference point from which we will compare all individual lag calculations within the plant. In this case, a centrally located combiner was chosen. Some initial calculation steps are required to extract lags involving the reference from the list of all possible sensor pairs. The arrow overplotted shows the computed CMV direction. The colorization indicates the lag relative to the reference. Positive values indicate that the point is upwind of the reference, while negative values indicate that the point is downwind. The outlier points with very high values of lag are present in the southeast corner of the plant.

[5]:

# Extract only the lags associated with the reference. Correct the lag's sign depending on which was the first point in the pair.
ref = 'CMB-022-5'
points = []
lags = []
for pair, lag in zip(dat_jam.allpairs, dat_jam.pair_lag):
    if ref in pair:
        point = pair[1] if ref == pair[0] else pair[0]
        lag_i = lag if ref == pair[0] else -lag
        points.append(point)
        lags.append(lag_i)

# Insert data back into the DataFrame for plotting
pos_utm['lag'] = np.zeros_like(pos_utm['E'])
pos_utm.loc[points, 'lag'] = lags
pos_utm.loc[ref, 'lag'] = 0


plt.scatter(pos_utm['E'], pos_utm['N'], c=-pos_utm['lag'], cmap='viridis')
vscale = 100
velvec = np.array(spatial.unit(spatial.pol2rect(cld_spd_jam, cld_dir_jam))) * vscale
plt.arrow(pos_utm['E'][ref], pos_utm['N'][ref], velvec[0], velvec[1], length_includes_head=True, width=7, head_width=20, color='red')
plt.clim(-60, 60)
plt.colorbar()
plt.xlabel('E')
plt.ylabel('N')
plt.title(f'Lag Relative to Ref (s)')
axes = plt.gca()
axes.xaxis.set_ticklabels([])
axes.yaxis.set_ticklabels([]);