Top 10 tips to maximize the benefits of your Log Analytics workspace by minimizing non-required data
Do you want to maximize the benefits you can get from your Log Analytics workspace? Start by controlling the amount of data you are uploading to data to what you really need.
I recently had a subscription where I needed to reign in data usage to the workspace quickly but I needed to do so in a way where most of the functionality of Log Analytics in OMS was still available. This post discusses the top 10 approaches to take to cut back the amount of data uploaded to a Log Analytics workspace while maintaining as much functionality as possible:
- Find out what’s using most of your data in your Log Analytics workspace
- Determine your hourly data addition rate
- Identifying computers which are not needed in Log Analytics
- Increase the intervals on performance counters
- Change security & auditing levels
- Remove solutions that you can live without (at least temporarily)
- Exclude large numbers of security events
- Exclude specific systems from network monitoring
- Tweaking configurations of solutions in System Center Operations Manager
- Tuning takes time
1) Find out what’s using most of your data in your Log Analytics workspace
Log Analytics includes built-in set of views which show the usage for Log Analytics. These are available from the left side using the three bars icon highlighted below.
To tune usage we care about the first three views: (Data volume over time, Data volume by solution, Data not associated with a computer). The first screenshot shows the workspace prior to tuning which had two systems providing more than 5 GB of data each and the LogManagement solution generating more than 34 GB of data!
After tuning this workspace, the data levels are looking much more in alignment with the size we had wanted for this particular demo environment.
Using these views we can easily identify the total amount of data, computers which are contributing the most data, solutions which are contributing the most data and where is data coming in which is not associated with a computer.
You can also use Wei’s “Free Tier Data Consumption Tracker” to see how much of your quota is being used and what is using the most data (to tweak his solution for workspaces with more than 500 MB see this blog post).
Before I started tuning, the workspace I was working on was at 110%+ of its quota with heavy focus on Security, LogManagement and WireData.
After tuning this workspace, the % utilization has dropped significantly (less than 50%).
This solution makes it even easier to see what is using the most space by solution, data type and more.
From the dashboards above we can see that the largest amount of data being added in this particular workspace is related to LogManagement (20%) and Security (15.3%). By data type, we have lots of data from the SecurityEvent Type (14.7%) and Perf (11.9%). From these various dashboards we can see what areas to focus on which will help us to minimize the amount of data being sent.
2) Determine your hourly data addition rate
Next we want to determine what our maximum records per hour should be for the size of the workspace which we are working towards. We can use a “search *” and set the time to the last hour to determine how many records are being written per hour. Initially we were initially seeing 72K records written per hour. After initial tuning we had this down to 47K records written per hour as shown below (using the previous query language).
After more tuning of this environment, we are down to 33K records written per hour.
For the free tier our maximum should be just under 40K records per hour to keep it under the 500 mb per day cap. The math on this should extend forward so that:
Cap/records per hour:
500 mb 40K
1 GB 80K
10 GB 800K
Knowing what number of records per hour you need helps to to provide a simple target to aim for and helps you to track where you are at with your data tuning process.
3) Identifying computers which are not needed in Log Analytics
The built-in usage views and Wei’s solution provide a quick way to see what computers are providing data into OMS (see the “Find out what’s using data in your Log Analytics workspace” section of this blog post for example graphics). If there are computers which you do not want to be sending data to Log Analytics start first by removing them from OMS either as directly attached agents or removing them from SCOM integration. As an example, if you have workstations reporting to OMS and you don’t want workstations reporting to OMS start by removing them. If you have specific servers which you don’t want in OMS, remove them from OMS.
4) Increase the intervals on performance counters
If we use a “search *” query we can see what types are the most common in the workspace. In the query below the highest was for “Perf” by far when compared with other types.
In most workspaces, performance counters often represent a significant number of records which are gathered into Log Analytics. To check which types of performance counters these we can use a query like this: search * | where ( Type == “Perf” )
This type of a query will provide a list of the most common objectnames and counternames.
When we were initially tuning, the following were the heavy perf counter collections. This led us to remove the “Capacity & Performance” solution but to re-investigate it later.
NOTE: For additional background, we did check to see if we could change these counters directly in the Log Analytics data settings or though making changes to the rules via System Center Operations Manager. The XML configuration for the rule is shown below. We were not able to change these though an override (only enable or disable). The “Capacity & Performance” solution is currently in preview so I expect that this will change before the production release.
After removing the “Capacity & Performance” solution we had a different set of counters which were generating the most data. Logical disk represented the most data being collected in Perf.
These performance counters were being collected relatively frequently and could be changed easily in settings / data / Windows Performance Counters. For our environment we increased the sample interval to 1500 (25 minutes) for the disk counters and the rest of the counters were increased to 300 seconds (5 minutes).
5) Change security & auditing levels
A great place to minimize the amount of data which is being gathered into Log Analytics is in the Security and Audit solution under security settings. By default this solution will collect all Windows Security and AppLocker event logs. This can be decreased by changing it to either Common, Minimal or None (not recommended).
When tuning the amount of data start with Common and then decrease to Minimal if the tuning only if it’s required.
6) Remove solutions that you can live without (at least temporarily)
If there are solutions in your workspace which you do not need and they are using significant amounts of data start with removing these. For our tuning we ended up removing the “Capacity & Performance” solution (due to the 60 second performance counters), the “Wire Data” solution and the “Security & Auditing” solution (temporarily).
7) Exclude large numbers of security events
One of the largest changes we needed to make was to exclude some service accounts from logging their security events. For our environment, three service accounts represented almost 200K (or 5 hours a day as 40K per hour) worth of records.
To work with these I recommend this approach: http://blogs.catapultsystems.com/stompkins/archive/2016/08/16/filter-which-security-events-scom-sends-to-oms/. You can also potentially remove the server where these service accounts are from the Security & Auditing solution to cut down the amount of data: http://blogs.catapultsystems.com/cfuller/archive/2015/11/09/targeting-oms-solutions-to-specific-systems-in-operations-manager/
8) Exclude specific systems from network monitoring
In our environment there was one server (the main Hyper-V host) which was writing the most data.
To change this we opened the solution and removed the system from being checked as “Use for Monitoring”:
This approach is easy to directly in the Log Analytics portal.
9) Tweaking configurations of solutions in System Center Operations Manager
Sometimes you have to go into SCOM and make some tweaks to get the data levels to where you need them. For our environment we ended up having to create a group to exclude one node (our Configuration Manager server) from collecting its IIS logs due to the amount of volume. To do this we used an approach similar to this blog post (http://blogs.catapultsystems.com/cfuller/archive/2015/11/09/targeting-oms-solutions-to-specific-systems-in-operations-manager/) and created a group called: Exclude_OMS_IISLogs
We applied an override to disable (set enable to false) for this group to the “IIS Log Collection Rule” rule. This effectively excluded this particular system from having its IIS logs collected.
TIP: The easy way to find what rule is relevant is to go to the administration pane in SCOM, overrides and search on the group name used to activate OMS functionality (Microsoft System Center Advisor Monitoring Server Group).
10) Tuning takes time
Watching these types of changes make time. You make a change and then watch carefully over time how it impacts the % of your quota:
Also continue to watch the number of records which are written hourly to your OMS workspace.
Some tweaks take a few hours to really have an impact.
Or even check in the next morning:
Summary: Hopefully this blog post has given you some ideas and approaches which will help you to tune the amount of data which you have in your OMS workspace. If you have your own tuning tips for Log Analytics data post a comment here!