Earlier today, Camila Martins joined the latest episode Unsung Heroes of the Cloud. She did an amazing job explaining how to manage Azure diagnostics settings at scale:
She did such a great job explaining things, that I actually want to try out what she showed. The goal of this blog post is to explore how to automatically turn on diagnostics settings using Azure Policy. Specifically, I want to turn this on automatically for network security group (NSG) flow logs.
For an introduction to Azure Policy, please refer to this earlier post.
Let’s have a look.
What are diagnostic settings?
The embedded video explains what diagnostic settings are. In summary, they are resource-specific logs in Azure that are not stored by default. As a customer, you have to enable them per service.
By capturing diagnostic logs, you get more resource-specific information. In the case of for example Azure storage, that is an overview of the transactions happening in your storage account. By default, Azure monitor will only capture high-level information such as total transaction count, without for example item level details.
In the case of NSG flow logs, the flow logs generate detailed information about source IP and port, destination IP and port, the protocol, whether the traffic was allowed/denied, and also a counter of packets. This can be very valuable information for security analysis as well as for troubleshooting (I’ve used NSG flow logs more than once to verify if traffic is reaching its intended location).
Let’s have a look at how to automatically enable them.
How to do it automatically
To enable automatically enable diagnostic settings, you can use Azure Policy. Azure Policy has the option to “deployIfNotExists” when a new resource is created that doesn’t have the flow logs enabled.
There is a default policy definition that you can use to enable this called “Deploy a flow log resource with target network security group”. Let’s have a look at what setting this up looks like.
Setting it up
To start, open the policy definitions blade in the Azure portal. There you can look for the policies containing the “flow log” string.
Select the “Deploy a flow log resource with target network security group” policy and click the assign button. Next, you’ll have to configure the policy. First, provide it a scope (which in my case will be my full subscription).
Next, you’ll need to provide the region, a storage account resource ID in that region, and the network watcher in that region. To get the storage account ID, open your storage account and go to endpoints to find your resource ID. For network watcher, open the network watcher service in the portal, and copy-paste the name of the resource in the right region. The resource group name should be NetworkWatcherRG. Now provide all those inputs for the policy creation:
Next, you can optionally create a remediation task. A remediation task would give you the option to “fix” all non-compliant resources directly. This is optional, but I’m enabling it nonetheless.
Finally, you could specify a non-compliance message in the final step and then create the policy. I didn’t specify a non-compliance message and immediately created the policy.
Now that the policy got created, let’s try it out!
Trying it out
To try it out, I created a new NSG in West US 2 in the portal:
It then took a couple of seconds for the NSG to be created. When I checked it immediately after creation, the flow log wasn’t created:
It then took a good 10 minutes for the flow log to appear.
I found the timing to be a bit weird, but checking the activity log showed that it went through fine. The deployIfNotExists action indeed started the moment the NSG was created, it just took a good 10 minutes to deploy.
But anyway: that’s how you can automatically create NSG flow logs for newly created NSGs.
In this post, we explored how to automatically set up NSG flow logs using Azure Policy. We used the portal to do this for a single region, but as you can guess, this can easily be automated for multiple regions (and multiple services) using automation tools. If you’re interested in this, I would recommend checking out this video on Enterprise-Scale Landing Zones Devops that covers this pretty well using GitHub Actions.