Using Log Analytics (and a special guest) to forecast electricity generation

This is the final post of a six post series which will provide details on a solution I have been working on which will gather information from a variety of sources to forecast how much electricity will be generated by a residential solar panel system. This example shows how OMS can be used to gather data from multiple sources, how to visualize this data in Power BI and how to use a variety of data sources in OMS to forecast likely future results. This series includes the following posts:

  1. Gathering weather data into Microsoft OMS (the script details are available here)
  2. Using Power BI to visualize weather data from Microsoft OMS
  3. How to add weather forecasting information into Microsoft OMS
  4. How to add SolarCity electricity generation information into Microsoft OMS
  5. How to use Power BI to cross multiple custom log types
  6. How to bring this all together and forecast electricity generation

In this blog post we get to see the best part of this solution – how we can combine weather information, weather forecasting and SolarCity electricity generation to do something that has not been done prior to this blog post. We can now forecast electricity generation based on historical electricity generation and weather patterns! And there is a huge bonus for this approach from my colleague Lee Berg and this should be a memorable blog post and wrap-up for this series!

Bringing it all together – the new query language for Log Analytics!

The key to making this functionality possible is the release of the new query language for Log Analytics. For simplicity in this blog post we will reference this throughout as “KQL”. With this new query language we can perform significantly complex queries like we will need to take information from these various sources and to use that data to project information in the future.

Key lessons learned working with KQL:

  1. Map out the vision for how the query will work: Develop pseudocode or a simple flow chart explaining what information will be in the query and how it will be gathered.
    1. For this approach, the pseudocode follows these major steps:
      1. Identify the maximum energy generated per hour to identify the capacity of the solar panels
      2. Identify the impact of the weather as a percentage of the maximum energy which can be generated
      3. Identify the weather forecast for the next day
      4. Use the weather forecast and the weather impact percentage to provide an estimate for solar electricity generation
  2. Start small: Develop small queries which can be used to provide pieces of the requirement. Once you have the query working you can use the “Let” capability to assign that to a variable to use throughout the query.

Getting started by generating the maximum energy bell curve:

Looking at the SolarCity data we can see that there is a bell curve associated with normal electrical production. This will be the base concept that we are building our OMS queries from.

To move things forward we need to know what the maximum energy generation is for this particular solar installation for each hour of the day. To do this we define “MaxEnergyBeWeatherHour” which identifies the maximum energy gathered throughout the history available.

// Calculate the maximum energy generated for each hour using data from the past year. Used to calculate relative energy for each weather condition.

let MaxEnergyByWeatherHour = SolarCity_CL

| where Timestamp_t > ago(365d)

| summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick)

| summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour)

| extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2))

| where TotalEnergyHour > 0

| summarize MaxEnergyHour=max(TotalEnergyHour) by SampleHour

| project SampleHour, Energytype="Maximum", MaxEnergyHour;

Next we define “EnergyByWeatherHour” which gathers the maximum amount of energy generated based on each available weather condition. We combine these two with a union and the graph is shown at the end of the query.

let EnergyByWeatherHour = SolarCity_CL

| where Timestamp_t > ago(365d)

| summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick)

| summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour)

| extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2))

| join kind=leftouter (

OpenWeather_CL

| where TimeGenerated > ago(365d)

| summarize max(Description_s) by City_s, TimeDateHour=bin(TimeGenerated, 1hour)

) on City_s, TimeDateHour

| project City=City_s,TimeDateHour,TotalEnergyHour, Weather=max_Description_s, SampleHour

| where TotalEnergyHour > 0 and Weather != ""

| summarize MaxEnergyHour=max(TotalEnergyHour) by SampleHour, Weather

| project SampleHour, Energytype=Weather, MaxEnergyHour;

MaxEnergyByWeatherHour | union EnergyByWeatherHour

| sort by SampleHour asc

| render timechart by MaxEnergyHour

Note how the curve above for Maximum closely mirrors the SolarCity bell curve shown earlier in the blog post.

Calculating weather impact percent:

This query takes what we built up top to the next logical level. The first query contains the maximum energy from the graph below defined as “MaxEnergyByWeatherHour”.

// Calculate the maximum energy generated for each hour using data from the past year. Used to calculate relative energy for each weather condition.

let MaxEnergyByWeatherHour = SolarCity_CL

| where Timestamp_t > ago(365d)

| summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick)

| summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour)

| extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2))

| where TotalEnergyHour > 0

| summarize MaxEnergyHour=max(TotalEnergyHour) by SampleHour

| project SampleHour, Energytype="Maximum", MaxEnergyHour;

The second query contains the various weather conditions defined as “EnergyByWeatherHour”. We then define “WeatherImpact” based on a join of the “MaxEnergyByWeatherHour” and “EnergyByWeatherHour” (joining these lets us do math between the maximum energy and the energy for the various weather conditions). “WeatherImpact” defines how much each weather condition on average impacts electricity generation.

// Find the % impact of each weather condition by comparing the hourly maximum under each weather condition

let EnergyByWeatherHour = SolarCity_CL

| where Timestamp_t > ago(365d)

| summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick)

| summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour)

| extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2))

| join kind=leftouter (

OpenWeather_CL

| where TimeGenerated > ago(365d)

| summarize max(Description_s) by City_s, TimeDateHour=bin(TimeGenerated, 1hour)

) on City_s, TimeDateHour

| project City=City_s,TimeDateHour,TotalEnergyHour, Weather=max_Description_s, SampleHour

| where TotalEnergyHour > 0 and Weather != ""

| summarize EnergyHour=max(TotalEnergyHour) by SampleHour, Weather

| project SampleHour, WeatherCondition=Weather, EnergyHour;

let WeatherImpact = EnergyByWeatherHour | join MaxEnergyByWeatherHour on SampleHour

| where strlen(WeatherCondition) > 0

| extend ImpactFactor = EnergyHour/MaxEnergyHour

| summarize ImpactFactorByWeather=avg(ImpactFactor) by WeatherCondition;

WeatherImpact

| sort by ImpactFactorByWeather


This can also be seen more easily in a graph format:

Not surprisingly, we see that the best weather condition for solar electricity generation is “clear sky” and “scattered clouds” (at 1) and the worst was “moderate rain mist” (at .24). Now that we have the maximum electricity generation, the impact of various weather conditions and the weather forecast we can develop our estimated electricity generation for the next day!

Gathering the relevant weather forecast information:

Next we need to gather information about what the weather forecast looks like for the next day. The challenge is that the weather data which is provided only provides a forecast for every three hours of the day as we can see with the next query:

let minForecastHour = toscalar(OpenWeatherForecast_CL | summarize min(ForecastTimeDate_t));

let maxForecastHour = toscalar(OpenWeatherForecast_CL | summarize max(ForecastTimeDate_t));

OpenWeatherForecast_CL

| join kind=rightouter (range ForecastDateHour from bin(minForecastHour,1h) to maxForecastHour step 1h) on

$left.ForecastTimeDate_t == $right.ForecastDateHour

| project ForecastDateHour, ForecastWeather_s

Here are the results:


Unfortunately we are missing records for 2/3 of the relevant hours. To fill these hours out with relevant information we’re going to need to be a little creative. The script below creates three different blocks of forecast information each with the results from the first hour’s weather forecast. When these are combined with a union, we get what we need for the next days’ forecast.

let ForecastBlock0 = OpenWeatherForecast_CL

| where ForecastTimeDate_t > startofday(now(+1day))

| sort by ForecastTimeDate_t asc

| project ForecastTimeDate_t, ForecastDescription_s

| extend SampleHour=datepart("Hour", ForecastTimeDate_t)+0;

let ForecastBlock1 = OpenWeatherForecast_CL

| where ForecastTimeDate_t > startofday(now(+1day))

| sort by ForecastTimeDate_t asc

| project ForecastTimeDate_t, ForecastDescription_s

| extend SampleHour=datepart("Hour", ForecastTimeDate_t)+1;

let ForecastBlock2 = OpenWeatherForecast_CL

| where ForecastTimeDate_t > startofday(now(+1day))

| sort by ForecastTimeDate_t asc

| project ForecastTimeDate_t, ForecastDescription_s

| extend SampleHour=datepart("Hour", ForecastTimeDate_t)+2;

ForecastBlock0 | union ForecastBlock1 | union ForecastBlock2

| sort by SampleHour asc;


Building the electricity generation forecast:

This approach calculates the maximum energy per hour, calculates the % impact of each weather condition, assembles the weather forecast and then performs the calculation (if the weather condition does not have a %Impact a default value of 1 is provided).

// Calculate the maximum energy generated for each hour using data from the past year. Used to calculate relative energy for each weather condition.

let MaxEnergyByWeatherHour = SolarCity_CL

| where Timestamp_t > ago(700d)

| summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick)

| summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour)

| extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2))

| where TotalEnergyHour > 0

| summarize MaxEnergyHour=max(TotalEnergyHour) by SampleHour

| project SampleHour, Energytype="Maximum", MaxEnergyHour;

// Find the % impact of each weather condition by comparing the hourly maximum under each weather condition

let EnergyByWeatherHour = SolarCity_CL

| where Timestamp_t > ago(700d)

| summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick)

| summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour)

| extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2))

| join kind=leftouter (

OpenWeather_CL

| where TimeGenerated > ago(700d)

| summarize max(Description_s) by City_s, TimeDateHour=bin(TimeGenerated, 1hour)

) on City_s, TimeDateHour

| project City=City_s,TimeDateHour,TotalEnergyHour, Weather=max_Description_s, SampleHour

| where TotalEnergyHour > 0 and Weather != ""

| summarize EnergyHour=max(TotalEnergyHour) by SampleHour, Weather

| project SampleHour, WeatherCondition=Weather, EnergyHour;

let WeatherImpact = EnergyByWeatherHour | join MaxEnergyByWeatherHour on SampleHour

| where strlen(WeatherCondition) > 0

| extend ImpactFactor = EnergyHour/MaxEnergyHour

| summarize ImpactFactorByWeather=avg(ImpactFactor) by WeatherCondition

| sort by ImpactFactorByWeather;

let ForecastBlock0 = OpenWeatherForecast_CL

| where ForecastTimeDate_t > startofday(now(+1day))

| sort by ForecastTimeDate_t asc

| project ForecastTimeDate_t, ForecastDescription_s

| extend SampleHour=toint(datepart("Hour", ForecastTimeDate_t)+0);

let ForecastBlock1 = OpenWeatherForecast_CL

| where ForecastTimeDate_t > startofday(now(+1day))

| sort by ForecastTimeDate_t asc

| project ForecastTimeDate_t, ForecastDescription_s

| extend SampleHour=toint(datepart("Hour", ForecastTimeDate_t)+1);

let ForecastBlock2 = OpenWeatherForecast_CL

| where ForecastTimeDate_t > startofday(now(+1day))

| sort by ForecastTimeDate_t asc

| project ForecastTimeDate_t, ForecastDescription_s

| extend SampleHour=toint(datepart("Hour", ForecastTimeDate_t)+2);

let ForecastBlock = ForecastBlock0 | union ForecastBlock1 | union ForecastBlock2

| sort by SampleHour asc;

//WeatherImpact

//MaxEnergyByWeatherHour

ForecastBlock

| join MaxEnergyByWeatherHour on SampleHour

| join kind=leftouter WeatherImpact on $left.ForecastDescription_s == $right.WeatherCondition

| sort by SampleHour1 asc

| extend ImpactFactorByWeather2=iif(isnull(ImpactFactorByWeather),toreal(1),toreal(ImpactFactorByWeather))

| project SampleHour1, ForecastDescription_s, MaxEnergyHour, ImpactFactorByWeather2, PredictedEnergyHour=MaxEnergyHour*ImpactFactorByWeather2

| summarize ElectricalGenerationEstimate=sum(PredictedEnergyHour)

Here’s the same data in graph form:


The final value is the summarization of the PredictedEnergyHour for the day:

Fifty total lines of code to forecast electricity generation based on historical electrical generation combined with weather forecasting! Not bad!

Using this query in the OMS console (or Azure portal dashboard):

Or in one line to paste this in and use it in the OMS console:

let MaxEnergyByWeatherHour = SolarCity_CL | where Timestamp_t > ago(700d) | summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick) | summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour) | extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2))| where TotalEnergyHour > 0 | summarize MaxEnergyHour=max(TotalEnergyHour) by SampleHour | project SampleHour, Energytype="Maximum", MaxEnergyHour; let EnergyByWeatherHour = SolarCity_CL | where Timestamp_t > ago(700d) | summarize TotalEnergyInterval=max(EnergyInIntervalkWh_d) by State_s, City_s, bin(Timestamp_t,1tick) | summarize TotalEnergyHour=sum(TotalEnergyInterval) by City_s,TimeDateHour=bin(Timestamp_t,1hour) | extend SampleHour=toint(substring(tostring(TimeDateHour()),11,2)) | join kind=leftouter ( OpenWeather_CL | where TimeGenerated > ago(700d) | summarize max(Description_s) by City_s, TimeDateHour=bin(TimeGenerated, 1hour) ) on City_s, TimeDateHour | project City=City_s,TimeDateHour,TotalEnergyHour, Weather=max_Description_s, SampleHour | where TotalEnergyHour > 0 and Weather != "" | summarize EnergyHour=max(TotalEnergyHour) by SampleHour, Weather | project SampleHour, WeatherCondition=Weather, EnergyHour; let WeatherImpact = EnergyByWeatherHour | join MaxEnergyByWeatherHour on SampleHour | where strlen(WeatherCondition) > 0 | extend ImpactFactor = EnergyHour/MaxEnergyHour | summarize ImpactFactorByWeather=avg(ImpactFactor) by WeatherCondition | sort by ImpactFactorByWeather; let ForecastBlock0 = OpenWeatherForecast_CL | where ForecastTimeDate_t > startofday(now(+1day)) | sort by ForecastTimeDate_t asc | project ForecastTimeDate_t, ForecastDescription_s | extend SampleHour=toint(datepart("Hour", ForecastTimeDate_t)+0); let ForecastBlock1 = OpenWeatherForecast_CL | where ForecastTimeDate_t > startofday(now(+1day)) | sort by ForecastTimeDate_t asc | project ForecastTimeDate_t, ForecastDescription_s | extend SampleHour=toint(datepart("Hour", ForecastTimeDate_t)+1); let ForecastBlock2 = OpenWeatherForecast_CL | where ForecastTimeDate_t > startofday(now(+1day)) | sort by ForecastTimeDate_t asc | project ForecastTimeDate_t, ForecastDescription_s | extend SampleHour=toint(datepart("Hour", ForecastTimeDate_t)+2); let ForecastBlock = ForecastBlock0 | union ForecastBlock1 | union ForecastBlock2 | sort by SampleHour asc; ForecastBlock | join MaxEnergyByWeatherHour on SampleHour | join kind=leftouter WeatherImpact on $left.ForecastDescription_s == $right.WeatherCondition | sort by SampleHour1 asc | extend ImpactFactorByWeather2=iif(isnull(ImpactFactorByWeather),toreal(1),toreal(ImpactFactorByWeather)) | project SampleHour1, ForecastDescription_s, MaxEnergyHour, ImpactFactorByWeather2, PredictedEnergyHour=MaxEnergyHour*ImpactFactorByWeather2 | summarize ElectricalGenerationEstimate=sum(PredictedEnergyHour)

Tip: When transferring complex queries like the one above from the Azure Log Analytics advanced editor to the OMS console make all of the lines into a single line of text, remove all comments. Then paste the information into word and replace any sections with lots of spaces with a single space.

Using the above single line query we can calculate the forecasted value (the graphic below was created using the View Designer in Log Analytics).


And here’s the first days actual energy generation result of 53.2 versus the estimate of 52.7, not bad (at this point it appears to be within a 1% margin of error based on this small subset of forecasting versus actual electricity generated).

Or we can even add this to a dashboard in the Azure Portal!

 

 

Best KQL tricks learned during this blog post:

The following are the top 5 tricks that I learned with KQL from writing this blog post:

#1: Picking which hour of the day: changing a time variable to a sting to gather specific characters which indicate the hour of the day and then converting that to an int so it can be used effectively.

| extend SampleHour=toint(substring(tostring(TIMEVARIABLE_t()),11,2))

or by using a DatePart:

datepart("Hour", TIMEVARIABLE_t)

#2: When transferring your queries from the Azure Log Analytics (advanced query) page to be used in the OMS console, it needs to be one line with no comments included.

#3: When using project you can define what the field is called just by specifying the name such as in this example where Energytype is the name of the field given using the value of Weather:

| project SampleHour, Energytype=Weather, MaxEnergyHour;

#4: Build on block of code at a time and when it’s completed use a “let” to create a variable which contains the results. If you want to test the variable, just put the variable name after it and it will run the variable for you. Example:

let WeatherImpact = EnergyByWeatherHour | join MaxEnergyByWeatherHour on SampleHour

| where strlen(WeatherCondition) > 0

| extend ImpactFactor = EnergyHour/MaxEnergyHour

| summarize ImpactFactorByWeather=avg(ImpactFactor) by WeatherCondition

| sort by ImpactFactorByWeather;

WeatherImpact

#5: The best way to really learn KQL is to dig in and do it with an example like this. I learned an incredible amount about the power of KQL and tricks to use it from this blog post series – I hope it helped you as well to see some creative approaches to use KQL.

 

Let’s kick this up a notch!

Now that we have a functional query to predict electricity generation, and stats indicating how much electricity is generated on a daily basis let’s take this up a notch. Let’s integrate our SolarCity queries with Alexa! The infamous IoT master Lee Berg put together an Azure function which does data retrieval and injection for OMS. He blogged about it here: http://leealanberg.com/2017/02/AzureFunctionsOMS.

Lee worked with an Echo show to put together a video showing how it could be used to show you how much data was generated in the last 48 hours! Check it out here.

Summary: By using a combination of weather history, weather forecasting, SolarCity electricity generation data and KQL we can not only showcase the power of Log Analytics but we can also show how we can leverage various types of data to provide information which cannot be found anywhere else. If you layer Alexa on top of that you now provide a new ability to quickly know what has happened and what should happen based on that historical data.

I owe a HUGE debt of thanks to two people for this blog post:

  • Brian Wren: Without Brian the KQL query you saw for forecasting would have been way beyond my capabilities. Brian provides incredible documentation and was willing to be a second perspective on the development of the solution. I could not have accomplished this task without him so thank you MP Author! J
  • Lee Berg: Lee’s Alexa integration is awesome and shows how we can take data from Log Analytics and use it in ways I bet that nobody had imagined.

 

Next steps: This approach to forecasting has room for improvement and/or additional blog post topics.

1) It doesn’t take into consideration electricity generation is impacted by the time of year.

2) This approach requires a significant history of data to identify trends.

3) The current approach does not look back over history to see how the accurate the forecast was compared to the actual electrical generation.

4) The current approach requires checking a portal – it may be useful to send out a notification of electrical generation and forecasted electrical generation.

2 Comments

  1. Ziv Caspi [MSFT] October 8, 2017
  2. Cameron Fuller October 13, 2017

Leave a Reply