No DNS resolution after successful deployment of vCAv 3.5.1 appliance.
For the preparation of a migration, I needed to deploy a vCAv 3.5.1 appliance on the source site. The migration was going to be from vCD 9.5 to vCD 9.7 platform. During the initial setup of the vCav appliance, I ran into an issue when configuring the lookup service. The error message I got was: Could not find SSL/X509 certificate from “https://vcenter-fqdn:7444/lookupservice/sdk”. Because of that I was unable to finish the configuration wizard. In this article, I will show you the steps I took to troubleshoot and how I eventually fixed the issue.
The first thing to check is the IP configuration of the vCAv appliance. It is of course mandatory to have a valid IP configuration to communicate with the vCenter and vCloud instance. To do so, open the configuration page of the vCAv appliance and verify the IP configuration that has been configured. In my case the IP configuration that has been configured is correct.
The next step is to check the connectivity between the vCAv appliance and the vCenter server by sending an simple ping request to the vCenter server from the vCAv appliance. Ping request to the IP of the vCenter server is working as expected. Performing the same ping test with the FQDN instead of the IP address results in an error.
I received the following error: “vcenter-fqdn: Temporary failure in name resolution“. The error message tells us that it might be a DNS issue.
Let’s see the /etc/resolve.conf symbolic link to verify the nameserver and search values. The nameserver has been configured with a local cache resolver that forwards all the requests to the configured DNS server(s). The search domain value has been set correctly during the deployment of the vCAv 3.5.1 appliance with the OVA file. The resolve.conf file is managed by manisystem-resolved(8) as mentioned in the screenshot below, so do not edit the file.
To output the resolve.conf file, use the following command as shown below:
Let’s check the /etc/systems/resolved.conf file to see if the settings match the settings we saw earlier in the vCAv configuration page. Use the following command to do so:
Perform the following resolvectl command to output the DNS servers that has been configured in the /etc/systems/resolved.conf file.
Let’s use another tool to perform some DNS resolutions and this time we will use nslookup. The first test is performing a DNS resolution with the use of the local caching DNS resolver.
The second test is by setting the server to our configured DNS server to perform the DNS resolution.
Old Workaround – (see the new workaround)
Note: do not use this work around anymore! Please follow the new workaround.
In order to fix the DNS resolution issue within the vCAv appliance, we need to perform a few simple commands through the SSH or console session.
First of all, we need to delete the resolv.conf symbolic link that has been created automatically during the deployment of the vCAv appliance with the OVA file. The first line will be used to delete the current symbolic link that is pointing to ../run/systemd/resolve/stub-resolve.conf. The second line will create a new resolv.conf symbolic pointing to the ../run/systems/resolve/resolv.conf file.
rm -i resolv.conf -r ln -sf ../run/systems/resolve/resolv.conf resolv.conf
We can now perform the same ping test that was failing before.
The ../run/systems/resolve/stub-resolv.conf file has 127.0.0.53 as configured nameserver. This will use the local caching DNS to forward the DNS queries.
The ../run/systems/resolve/resolv.conf file has the correct DNS servers (configured DNS servers during the OVA deployment) as nameserver(s). This will use the configured DNS servers instead of the local caching DNS to forward requests.
After fixing the DNS resolution issue, I was finally able to configure the lookup service.
Workaround – New
The new workaround will be providing a much better and cleaner way to resolve the issue and to ensure you don’t encounter any upgrade issues.
There is currently a bug in the current 3.5.1 vCAV appliance where the appliance VAMI doesn’t set the domain search path correctly.
By default systemd-resolved doesn’t pass on local domains to the DNS server. The correct way to fix this is by adding the word “local” and the required domain names into the search domain paths of the resolved.conf file.
Note that today it’s generally recommended to avoid defining “.local” in a DNS server, as RFC6762 reserves this domain for exclusive MulticastDNS use.
Open the resolved.conf file with VIM to uncomment/update the domains value with the word “local” and the domain search paths as shown below.
In my case, I needed to undo the symlink change and point /etc/resolv.conf back to the ../run/systems/resolve/stub-resolv.conf file. You can skip this task if you didn’t performed the previous workaround.
rm -i resolv.conf -r ln -sf ../run/systems/resolve/stub-resolv.conf resolv.conf
Restart the systemd-networkd and systemd-resolved services to apply the changes. Restarting the service will result in a downtime of approximately 1 to 2 seconds.
systemctl restart systemd-networkd systemd-resolved
It is now time to test the DNS resolving after performing the changes.
You are most likely wondering why I didn’t redeploy the appliance first before troubleshooting? That was already done but that didn’t do much about the issue I had.
There was also an VMware KB article online called “DNS doesn’t function properly on fresh installations of vCloud Availability 3.5” that could cause the error: Could not find SSL/X509 certificate from “https://vcenter-fqdn:7444/lookupservice/sdk” but the provided workaround didn’t work for me.
If you have any questions about vCAv or about this article, please do not hesitate to contact me.
What’s New with VMware Cloud Foundation 4.1 #VMware #vExpert #VCF #SDDC https://dy.si/MPy2f
Hi all. My latest multi-part post dives into the do’s and don’ts of TLS encryption in combination with #CloudDirector and #NSX Load Balancers. Happy reading! @cloudhappens @vmwarensx #RunNSX @avinetworks https://dy.si/mVgji