Cloud Motion Vector Calculations

The Cloud Motion Vector (CMV) package contained in solarspatialtools.cmv is a package that implements cloud motion vector calculation methodologies from the literature. These methods are based on analysis of a distributed field of sensors.

This demo utilizes sample data from a plant to demonstrate the application of the method.

[1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from solarspatialtools import cmv, spatial

Read the Data

The sample data file is a compressed H5 file that contains all the info used to perform the analysis.

The Plant Layout

The layout of the plant is placed in the variable pos_utm. Its definition is contained in the field latlon, which produces a DataFrame of the combiner centroids in a UTM-like coordinate system. It contains combiner IDs as the index, and columns E and N that indicate the mean position of the combiner footprint (i.e. the centroid). The combiners are named following the pattern CMB-<INVERTER#>-<COMBINER#>.

The Time Series Data

The method relies on time series data from a plant. Two examples are stored in the data file with keys data_a and data_b. These each represent one hour of data for each combiner in the plant with a 10s sampling resolution. The DataFrames are indexed by a time that is artificially offset to begin at a time of 00:00:00 on an arbitrary day. The columns of the DataFrames have keys that match the index of pos_utm. The two time periods are known to have a high variability and thus be well suited to the CMV calculation.

[2]:
datafile = "data/sample_plant_2.h5"
pos_utm = pd.read_hdf(datafile, mode="r", key="utm")
ts_data = pd.read_hdf(datafile, mode="r", key="data_a")

ts_data
[2]:
CMB-001-1 CMB-001-2 CMB-001-3 CMB-001-4 CMB-001-5 CMB-001-6 CMB-001-7 CMB-002-1 CMB-002-2 CMB-002-3 ... CMB-047-7 CMB-047-8 CMB-048-1 CMB-048-2 CMB-048-3 CMB-048-4 CMB-048-5 CMB-048-6 CMB-048-7 CMB-048-8
2023-01-01 00:00:00 93.799402 93.985920 93.622077 94.318063 93.390276 93.459779 93.753268 95.253145 95.510679 95.718582 ... 92.871630 92.798612 91.074627 91.385004 91.554207 91.528461 91.404796 91.747153 91.461591 91.606280
2023-01-01 00:00:10 94.508405 94.819152 95.028923 95.105311 93.855248 94.410892 94.540710 95.633790 95.760610 95.699990 ... 92.753680 92.873880 91.292149 91.850400 91.644421 91.739311 91.842573 91.653509 91.475838 91.425386
2023-01-01 00:00:20 94.767218 95.245977 95.438032 95.826806 94.000549 94.737096 94.925867 95.603580 95.745574 95.265605 ... 92.933072 92.732030 91.251834 91.799432 91.926831 91.726960 91.657963 91.725119 91.453577 91.490994
2023-01-01 00:00:30 94.880949 95.406670 95.139340 95.771861 94.032838 94.730620 95.055108 95.470650 95.914626 96.075220 ... 92.942349 92.877739 91.443098 92.072665 91.937813 91.629913 91.534880 91.854972 91.431316 91.431947
2023-01-01 00:00:40 94.837813 95.538455 95.515880 95.826806 94.407401 94.698243 95.155677 95.337730 95.736634 95.525902 ... 92.886374 92.751329 91.496539 92.014166 91.867211 91.532872 91.664974 91.688921 91.409054 91.577226
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2023-01-01 00:59:20 14.373001 13.225903 14.791491 13.385478 13.085343 13.466051 14.118774 13.721180 13.306969 8.306490 ... 14.580523 15.408295 14.735548 15.268503 15.645326 17.442542 17.913292 17.870185 16.114405 16.653684
2023-01-01 00:59:30 14.583977 13.338135 14.856895 13.349449 13.317424 13.456741 14.205222 13.811406 13.327289 8.690166 ... 14.611648 15.615766 14.689607 14.798921 14.540775 16.176576 18.011441 18.704340 17.892674 19.351115
2023-01-01 00:59:40 14.610643 13.486927 14.896615 13.516509 13.312177 13.673675 14.273695 13.934259 13.347608 8.382548 ... 14.247965 15.191175 14.563032 14.489767 14.585489 15.712535 15.359870 15.863494 18.463466 23.291351
2023-01-01 00:59:50 14.594957 13.737747 15.056860 13.734342 13.508587 13.867665 14.499656 13.958829 13.495051 8.701196 ... 14.179527 14.860188 14.610850 14.564967 14.809852 15.814871 15.672231 16.217617 15.594370 17.694980
2023-01-01 01:00:00 14.933424 13.887485 15.247237 13.952175 13.770243 14.039324 14.641018 14.113163 13.674277 9.230496 ... 14.203536 14.929559 14.533342 14.613296 14.461280 15.638823 15.624369 16.330934 15.905045 17.639993

361 rows × 382 columns

Computing the CMV

The CMV calculation is performed by the compute_cmv function. The two methods used are the method of Gagne et al. and of Jamaly and Kleissl. Both methods involve computing the relative delays experienced throughout a field of sensor measurement points, as computed by the lagging cross correlation between the clearsky index time series of the sensors. These delays are induced by cloud motion over the plant, with downwind sensors experiencing a response to the time signal that is later than that experienced by upwind sensors. The differences between the two methods lie in the ways that they perform the analysis of the delay data in computing the CMV. Some differences also exist in the implementation of data quality control.

Gagne et al. Method

The method described by Gagne relies on a least squares regression between the spatial separations of the points and their relative delays to find the x and y components of the CMV vector. Quality control is used over the entire plant to select the sensors with the strongest mutual cross correlation, while still retaining a sufficient number of sensors for analysis. The full method is described in the paper:

Jamaly and Kleissl Method

The method of Jamaly and Kleissl determines the CMV by finding the direction that minimizes the variance in the velocities computed using the delays and positions of the sensors. The magnitude of the vector is determined to be the median of these individual sensor velocities. A variety of statistical quality controls is applied both on a pairwise basis for individual pairs of sensors, and on an aggregate basis.

Implementation Details

Inputs and Reference Points

Both functions require the time series and positions within the field. The routine will read the site identifiers from the index of the positions DataFrame and will generate all possible combinations of pairings. These pairings will be used to perform analysis on columns of the timeseries DataFrame, so all index values in positions must be present.

The optional reference_id parameter can contin a single reference identifier, or a list thereof, and can be used to limit the number of points used for the analysis. That is, rather than seeking all possible combinations of points, only pairwise combinations containing the members of reference_id will be considered. This can be particularly useful for time-sensitive applications.

Data Conditioning

The published methods both assume that timeseries contains values of the clearsky index. In principle, however, any signals can be used subject to the caveat that oddities may be experienced due to mismatch with the value ranges expected by the QC. In the case of the example data from the plant combiners being demonstrated here, no clearsky index is available and no practical means exists to calculate one (i.e. what is the “clearsky” irradiance for a combiner?). Normalizing values to have a range similar to clearsky index is possible by computing for example the 95th quantile of the time series. The value of corr_scaling specified would also be likely to impact the ability of the QC to identify good data, as several QC criteria rely on the absolute value of the correlation. The default expected values of correlation correspond to the correltion coefficient scaling produced by 'coeff'. Several customization options for the individual method QC are available via the options parameter, and are described in the docstring of the function. Notably, setting minvelocity: 1 can be used by the jamaly method to limit unrealistically high values of the lag.

Outputs

The function returns the speed and direction (in radians clockwise from east) of the CMV, as well as a WindspeedData object containing intermediate data from the calculation. Some interesting values include the list of pairings, individual correlation, lag and separation values for those pairings and the flags for each pair indicating the quality of the data. Definitions of the various flags are found in solarspatialtools.cmv.Flag. The field method_data also includes some specific data outputs produced by the steps of the individual methods, or as part of their QC approaches.

[3]:
hourlymax = np.mean(ts_data.quantile(0.95))
kt = ts_data / hourlymax

cld_spd_gag, cld_dir_gag, dat_gag = cmv.compute_cmv(kt, pos_utm,
                                                    reference_id=None,
                                                    method='gagne')
cld_spd_jam, cld_dir_jam, dat_jam = cmv.compute_cmv(kt, pos_utm,
                                                    reference_id=None,
                                                    method='jamaly')

print("Method     Speed  Angle:rad  Angle:°   N_good")
print(f"Gagne   {cld_spd_gag:8.2f} {cld_dir_gag:10.2f} {np.rad2deg(cld_dir_gag):8.2f} {sum(dat_gag.pair_flag == cmv.Flag.GOOD):8,}")
print(f"Jamaly  {cld_spd_jam:8.2f} {cld_dir_jam:10.2f} {np.rad2deg(cld_dir_jam):8.2f} {sum(dat_jam.pair_flag == cmv.Flag.GOOD):8,}")
Method     Speed  Angle:rad  Angle:°   N_good
Gagne      11.47       0.20    11.20    1,861
Jamaly      9.52       0.62    35.50   22,383

Inspecting the Results

Some instructive discussion can be motivated by the detailed results.

Visualizing the Velocity Calculation

For both methods, the velocity is basically computed based on the aggregate \(\Delta x / \Delta t\) across the entire plant. In the figure below, we show the plot of \(\Delta x\) vs \(\Delta t\) for all point pairs in the plant. The best fit line, whose slope indicates the velocity, is shown in red. The colorization indicates the value of the peak correlation, from which the lag was extracted. Higher correlation values indicate a better relationship between that particular pair of points.

The repeated display of the figure shows magenta highlight on the points that were flagged passing the method’s QC. These GOOD points were the only ones that were used in computing the best fit, which is also evident from the large group of points with unrealistically high lag on the right hand side of the figure. Points with a delay value of exactly zero were always excluded from the fit. It’s interesting to note that some degree of digitization exists withiin the results due to the fact that only integer time steps are calculated when identifying the lag with maximum cross correlation.

[4]:
fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(12, 4))
ax0.set_title('Result Data for Jamaly Method')
s = ax0.scatter(dat_jam.pair_lag, dat_jam.pair_dists, c=dat_jam.corr_lag, vmin=0, vmax=1, s=1)
ax0.plot([-150, 200], cld_spd_jam * np.array([-150, 200]), 'r--', linewidth=2)  # Best Fit
ax0.set_ylim([-1500, 1000])
ax0.set_xlim([-150, 900])
fig.colorbar(s).set_label("Peak Correlation (-)")
ax0.set_xlabel('Time Lag Between Signals (s)')
ax0.set_ylabel('Windward Separation Distance (m)')
ax0.legend(['Point Correlation', 'Speed Fit'])

ax1.set_title('Highlighted GOOD Points for Jamaly Method')
s = ax1.scatter(dat_jam.pair_lag, dat_jam.pair_dists, c=dat_jam.corr_lag, vmin=0, vmax=1, s=1)
ax1.plot([-150, 200], cld_spd_jam * np.array([-150, 200]), 'r--', linewidth=2)  # Best Fit
ax1.plot(dat_jam.pair_lag[dat_jam.pair_flag == cmv.Flag.GOOD],
         dat_jam.pair_dists[dat_jam.pair_flag == cmv.Flag.GOOD], 'mo', markersize=1)
ax1.set_ylim([-1500, 1000])
ax1.set_xlim([-150, 900])
fig.colorbar(s).set_label("Peak Correlation (-)")
ax1.set_xlabel('Time Lag Between Signals (s)')
ax1.set_ylabel('Windward Separation Distance (m)')
ax1.legend(['Point Correlation', 'GOOD Points', 'Speed Fit']);
../_images/demos_cmv_demo_7_0.png

Visualizing the Lag Across the Field

A second interesting plot is that of the delay across the field. In order to visualize this case, we need to select a reference point from which we will compare all individual lag calculations within the plant. In this case, a centrally located combiner was chosen. Some initial calculation steps are required to extract lags involving the reference from the list of all possible sensor pairs. The arrow overplotted shows the computed CMV direction. The colorization indicates the lag relative to the reference. Positive values indicate that the point is upwind of the reference, while negative values indicate that the point is downwind. The outlier points with very high values of lag are present in the southeast corner of the plant.

[5]:
# Extract only the lags associated with the reference. Correct the lag's sign depending on which was the first point in the pair.
ref = 'CMB-022-5'
points = []
lags = []
for pair, lag in zip(dat_jam.allpairs, dat_jam.pair_lag):
    if ref in pair:
        point = pair[1] if ref == pair[0] else pair[0]
        lag_i = lag if ref == pair[0] else -lag
        points.append(point)
        lags.append(lag_i)

# Insert data back into the DataFrame for plotting
pos_utm['lag'] = np.zeros_like(pos_utm['E'])
pos_utm.loc[points, 'lag'] = lags
pos_utm.loc[ref, 'lag'] = 0


plt.scatter(pos_utm['E'], pos_utm['N'], c=-pos_utm['lag'], cmap='viridis')
vscale = 100
velvec = np.array(spatial.unit(spatial.pol2rect(cld_spd_jam, cld_dir_jam))) * vscale
plt.arrow(pos_utm['E'][ref], pos_utm['N'][ref], velvec[0], velvec[1], length_includes_head=True, width=7, head_width=20, color='red')
plt.clim(-60, 60)
plt.colorbar()
plt.xlabel('E')
plt.ylabel('N')
plt.title(f'Lag Relative to Ref (s)')
axes = plt.gca()
axes.xaxis.set_ticklabels([])
axes.yaxis.set_ticklabels([]);
../_images/demos_cmv_demo_9_0.png