Quantcast
Channel: SQL Server Cluster Archives - SQL Authority with Pinal Dave
Viewing all 53 articles
Browse latest View live

SQL SERVER – Patch Failure – Unable to Retrieve the Cluster Service

$
0
0

Few of my clients are still running older versions of SQL Server and they have their own reasons to not to move to the new SQL version. In this blog, we would learn about the fix of an error for which my client contacted me. My client was trying to apply the patch to a SQL Server 2008 R2 clustered instance which was running on Window Server 2012 and getting the error: Unable to retrieve the Cluster Service – Not clustered or the cluster service is up and online.

First, I asked to share Detail.txt to understand the exact error. Here is the information available. I have added a bullet point for clarity.

  • Slp: Unable to retrieve the Cluster Service
  • Slp: System.InvalidOperationException: The given key was not present in the dictionary. —> System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
  • ThrowHelper.ThrowKeyNotFoundException()
  • Collections.Generic.SortedList`2.get_Item(TKey key)
  • SqlServer.Chainer.Infrastructure.ServiceContainer.GetService(Type serviceType)
  • — End of inner exception stack trace —
  • SqlServer.Chainer.Infrastructure.ServiceContainer.GetService(Type serviceType)
  • SqlServer.Chainer.Infrastructure.ServiceContainer.GetService[T]()
  • SqlServer.Chainer.Infrastructure.ServiceContainer.get_Cluster()
  • SqlServer.Configuration.SetupExtension.RunDiscoveryAction.ExecuteAction(String actionId)

One of the causes of the above error is found earlier which I shared via below blog. SQL SERVER – Patch Rule Failure: Not Clustered or the Cluster Service is Up and Online

The blog above should be used when the cluster is in a bad shape. But in the current situation, my client had cluster working fine. It was just the SQL patch setup which was not able to talk to the cluster.

WORKAROUND/SOLUTION

I knew that there are articles from Microsoft which talks about but I was not getting any hit with the error which I saw in Detail.txt

Finally, we found that “Failover Cluster Automation Server” feature was not installed. This is default setting when install windows cluster service, but the feature is needed when installing the SQL Server 2008 R2 patch. Here is the PowerShell command to add the feature.

Add-WindowsFeature -Name RSAT-Clustering-AutomationServer

Finally, I found KB which talks about the solution.

Here is the snippet from the article.

To resolve this issue, do one of the following:

  • Enable the Failover Cluster Automation Server feature on each node by using Server Manager. In Server Manager, expand Remote Server Administration Tools, expand Feature Administration Tools, expand Failover Clustering Tools, and then click to select Failover Cluster Automation Server.

Here is the PowerShell screenshot while installing it. I have used below command to view the feature.

Get-WindowsFeature RSAT-Cluster*

SQL SERVER - Patch Failure - Unable to Retrieve the Cluster Service patch-clus-01

Hope this blog help in finding the solution of patch failure. Please comment and let me know.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Patch Failure – Unable to Retrieve the Cluster Service


SQL SERVER – Availability Group Seen in SSMS but missing in Cluster Manager. What’s Wrong?

$
0
0

One of my existing clients has sent me an interesting question. In this blog, I would share one interesting situation where my client has created an availability group and it was visible in SSMS, but no role was visible in failover cluster manager.

Here is the picture which you explain what we were seeing.

SQL SERVER - Availability Group Seen in SSMS but missing in Cluster Manager. What’s Wrong? ssms-clus-type-missing-01

Generally, when an availability group is created, an AG resource is created in a Windows cluster.

WORKAROUND/SOLUTION

The question which was asked by my client was perfectly valid. It is possible to run into the situation if we do below steps.

  1. Enable feature of AlwaysOn availability group without creating windows cluster.
    SQL SERVER - Availability Group Seen in SSMS but missing in Cluster Manager. What’s Wrong? ssms-clus-type-missing-02
  2. Create Availability Group and choose NONE as cluster type as there is no option called “NONE” as cluster type in below screen.
    SQL SERVER - Availability Group Seen in SSMS but missing in Cluster Manager. What’s Wrong? ssms-clus-type-missing-03
  3. Now, enabled windows clustering feature and create windows cluster.

At this point, you can open cluster manager and would see no availability group role because availability group was created as Cluster Type None.

So, if you ever see such situation, check the Dashboard of Always On availability group and have a look at cluster type which is highlighted as below.

SQL SERVER - Availability Group Seen in SSMS but missing in Cluster Manager. What’s Wrong? ssms-clus-type-missing-04

This feature is called as a Read Scale and you can read more about this at below link Use read-scale with Always On availability groups

Hope you found this information useful. Please comment and let me know.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Availability Group Seen in SSMS but missing in Cluster Manager. What’s Wrong?

SQL SERVER – FIX: The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) Node.

$
0
0

Sometimes if the sequence is not followed correctly, we might see some weird errors. In this blog, we would learn how to fix always on an error while adding new replica “The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) node.”

This was one of my existing clients to whom I worked and configured Always On Availability Group. They had some issues with nodes, and they ended up in rebuilding the cluster.

Later, they were able to redeploy and bring the cluster back. Now, when they were trying to add a replica Node2 from Node1 via SQL Server Management Studio, an error was displayed.

SQL SERVER – FIX: The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) Node. ao-node-add-err-01

Here is the text of the error message

The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) node. (Microsoft.SqlServer.Management.HadrTasks)
 ——————————
Program Location:
 at Microsoft.SqlServer.Management.Hadr.SelectReplicasController.VerifyConnection(UIConnectionInfo ci, IServerType serverType)
   at Microsoft.SqlServer.Management.UI.ConnectionDlg.Connector.ConnectionThreadUser()

From the message is clear that there is something wrong between SQL and Cluster communication. I asked them to run below command on all nodes which are part of the windows cluster.

SELECT *
FROM sys.dm_hadr_cluster_members
GO
SELECT *
FROM sys.dm_hadr_cluster

When we ran the query on the node which was giving an error, there were no rows.

WORKAROUND/SOLUTION

We disabled Always On Availability Group on the “bad” node using SQL Server Configuration Manager. Once done, we restarted the SQL Service. Then we enabled it again followed by another SQL Service restart. After finishing this, we were able to get information about the cluster and nodes via the query which I mentioned earlier.

Have you seen such an issue earlier? What were the steps to break it?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – FIX: The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) Node.

SQL SERVER – Drop failed for Availability Group – Failed to Destroy the Windows Server Failover Clustering Group Corresponding to Availability Group

$
0
0

SQL SERVER - Drop failed for Availability Group - Failed to Destroy the Windows Server Failover Clustering Group Corresponding to Availability Group droperror While working with one of the clients to recover from a disaster, I encountered an error. In this blog we would talk about error Drop failed for Availability Group. If you want any such assistance, you can hire me for quick consultation using On Demand offering.

THE SITUATION

When I got a call from my client, they were into the situation where availability group was not coming online. After an investigation, we concluded that their windows cluster was having issues and it was unable to start. The solution in such disaster is to start windows cluster in force quorum mode.

Once we started to cluster it in ForceQuorum mode (also called as fix quorum) we tried deleting the AG and received below error.

Failed to destroy the Windows Server Failover Clustering group corresponding to availability group ‘AG’.  The operation encountered SQL Server error 41000 and has been terminated.  Refer to the SQL Server error log for details about this SQL Server error and corrective actions.
An error occurred while removing availability group ‘AG’. The DROP AVAILABILITY GROUP command removed the availability group configuration from the local metadata. However, the attempt to remove this configuration from the Windows Server Failover Clustering (WSFC) cluster failed because the Always On Availability Groups manager is not online (SQL Server error: 41081). To remove the availability group configuration from the WSFC cluster, re-enter the command. (Microsoft SQL Server, Error: 41081)

In above error message, we see another error number: 41000 for which the text is as follows. “Failed to obtain the local Windows Server Failover Clustering (WSFC) handle (Error code %d).  If this is a WSFC availability group, the WSFC service may not be running or may not be accessible in its current state.  Otherwise, contact your primary support provider.  For information about this error code, see “System Error Codes” in the Windows Development documentation.”

WHAT YOU SHOULD DO?

Well, no action needed. The message raised is not a dangerous error message. It clearly mentions that availability group information has been removed from SQL Server, but it might still exist in the failover cluster. So, you can click OK on message and check availability group in SSMS and it should be gone from “Always On High Availability” > “Availability Group”. At this point, the database would be in restoring state and you need to bring them online using below

ALTER DATABASE <DBName> SET ONLINE

OR

RESTORE DATABASE <DBName> WITH RECOVERY

Once things are back to normal, you need to reconfigure availability again.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Drop failed for Availability Group – Failed to Destroy the Windows Server Failover Clustering Group Corresponding to Availability Group

SQL SERVER – Always On Listener Failure – Provisioning Computer Object Failed With Error 5

$
0
0

In the past, I have written a few blogs where I explained a few issues about the failure of listener creation in the cluster. In this blog, we would see another cause of listener creation failure.

SQL SERVER - Always On Listener Failure - Provisioning Computer Object Failed With Error 5 failure-800x263

I explained that whenever there are any issues related to the cluster resources, we should always look at the cluster log. If you are not sure how to generate cluster logs, read my earlier blog on the same topic. SQL SERVER – Steps to Generate Windows Cluster Log?

  • Listener Name – ProdListener
  • Windows Cluster Name – WinCluster

Here were the messages in the cluster log

[RES] Network Name : AccountAD: PopulateNetnameADState – DCFlags 1073745937 LdapReferralAllowed 0
[RES] Network Name: [NNLIB] FindSuitableDCNew – objectName ProdListener, username – WinCluster$, firstChoiceDCName – \\DC01.domain.com
[RES] Network Name: [NNLIB] Found first choice DC that has the object ProdListener DCName – \\DC01.domain.com.

[RES] Network Name: [NNLIB] GetOUForVCOCreation – OU name of CNO is OU=IN,DC=Domain,DC=Com
[RES] Network Name: [NNLIB] GetOUForVCOCreation – CNO SID is S-1-5-21-2987268983-2468375096-37832846-293957
[RES] Network Name : AccountAD: Object ProdListener already exists, using DC: \\DC01.domain.com

[RES] Network Name: [NNLIB] Changing Password for object ProdListener on DC: \\DC01.domain.com result: 5
[RES] Network Name: [NN] IdentityLocal End Impersonating
[RES] Network Name : AccountAD: Provisioning computer object failed with error 5
[RES] Network Name : AccountAD: OnInitializeEnd: 5
[RES] Network Name : Configuration: InitializeReplyCreation of NetName (type Singleton), result: 5, IsCanceled: false

I have trimmed the cluster logs and kept only interesting messages. In my client’s case, they already had the Listener (ProdListener) provisioned in the domain controller. The most relevant error is “Changing Password for object ProdListener on DC: \\DC01.domain.com result: 5”. This essentially means that the Cluster is trying to change the password of the object and failing with access denied. The key part here is that this operation is done by “username – WinCluster$”. This is not a domain user but a cluster computer object, also known as CNO.

“Access is denied” error can be caused due to many other reasons which can cause network name or client access point resource in cluster creation to fail. Based on my experience one of the most common causes would be where the Domain Administrator does not allow the CNO “Read All Properties” and “Create Computer Objects” permissions. You might see “Access is denied” in the event log.

WORKAROUND/SOLUTION – Listener Failure

Based on the cluster log snipper, it was clear that a domain admin needs to provide access on VCO to CNO. VCO is a virtual computer object which is a listener in Always On scenario. CNO is a cluster name object which is a computer object for Windows Cluster network name in active directory.

  • Open the Active Directory Users and Computers Snap-in (dsa.msc) on a domain controller.
  • In Menu > View -> check Advanced Features. (Otherwise, we would not see option explained in next steps)
  • Right-click the OU/Container where we want the VCO (ProdListener in above example) to be created and click “New” -> “Computer”
  • Provide a name for the object (This will be Listener Name for AG) and click “OK”.
  • Right-click on the VCO which we just created and select “Properties”. Click the Security tab and then click “Add”:
  • Enter the CNO (Make sure to select “Computers” option in the “Object Types” window) and click “OK”. The CNO is a Cluster Name Object. This is the name of the Windows Cluster name NOT listener or FCI name.
  • Give CNO “Full Control” over the VCO.

After this, we should be able to bring listeners online in the cluster manager.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Always On Listener Failure – Provisioning Computer Object Failed With Error 5

SQL SERVER – Always On Listener Creation Failure – Enabling Object ProdListener Failed With Error 5

$
0
0

In the past, I have written a few blogs where I explained a few issues about the failure of listener creation Failure in a cluster. In this blog, we would see another cause of listener creation failure.

SQL SERVER - Always On Listener Creation Failure - Enabling Object ProdListener Failed With Error 5 listenercreation-800x252

Whenever I get issues while creating a listener in SQL Server, I look at the failover cluster manager to figure out the resource which is failing and then I generate a cluster log. If you are not sure how to generate cluster logs, read my earlier blog on the same topic. SQL SERVER – Steps to Generate Windows Cluster Log?

Here is the setup of my lab. This would help you in co-relating the cluster log from your environment.

  • Listener Name – ProdListener
  • Windows Cluster Name – WinCluster

As I mentioned my listener was failing, which is a client access point in the cluster manager. In the below lines from the cluster log, I have removed the timestamp to avoid clutter.  Here is the trimmed version of the cluster log from the node where the network name resource was not coming online.

INFO [RES] Network Name : AccountAD: Initializing Name: ProdListener, NetbiosName: ProdListener, Type: Singleton, Created: false
INFO [RES] Network Name : AccountAD: PopulateNetnameADState – DCFlags 1073745937 LdapReferralAllowed 0
INFO [RES] Network Name: [NNLIB] FindSuitableDCNew – objectName ProdListener, username – WinCluster$, firstChoiceDCName – \\domain.com
INFO [RES] Network Name: [NNLIB] Found first choice DC that has the object ProdListener DCName – \\domain.com.
INFO [RES] Network Name: [NNLIB] GetOUForVCOCreation – OU name of CNO is OU=IN,DC=Domain,DC=Com
INFO [RES] Network Name: [NNLIB] GetOUForVCOCreation – CNO SID is S-1-5-21-2987268983-2468375096-37832846-293957
INFO [RES] Network Name : AccountAD: Object ProdListener already exists, using DC: \\domain.com
INFO [RES] Network Name: [NNLIB] Trying to enable disabled object ProdListener on DC: \\domain.com (Flags 0000000000001022)
WARN [RES] Network Name : AccountAD: Enabling object ProdListener failed with error 5

In above cluster log, we can see INFO, WARN, ERR which is similar to what we see in windows event logs, Warning, Information and Error. The most relevant error is “Network Name <ProdListener>: AccountAD: Enabling object ProdListener failed with error 5”. This essentially means that Cluster is trying to enable the object and failing with access denied error. They key part here is that this operation is done by “username – WinCluster$”. This is not a domain user but a cluster computer object, also known as CNO.

“Access is denied” error can be caused due to many other reasons which can cause network name or client access point resource in cluster creation to fail. Based on my experience one of the most common causes would be where the Domain Administrator does not allow the CNO “Read All Properties” and “Create Computer Objects” permissions. You might see “Access is denied” in the event log.

WORKAROUND/SOLUTION – Listener Creation

Based on the cluster log snippet, it was clear that a domain admin needs to provide access on VCO to CNO. VCO is a virtual computer object which is a listener in Always On scenario. CNO is a cluster name object which is a computer object for Windows Cluster network name in active directory.

  • Open the Active Directory Users and Computers Snap-in (dsa.msc) on a domain controller.
  • In Menu > View -> check Advanced Features. (Otherwise, we would not see option explained in next steps)
  • Right-click the OU/Container where we want the VCO (ProdListener in above example) to be created and click “New” -> “Computer”
  • Provide a name for the object (This will be Listener Name for AG) and click “OK”.
  • Right-click on the VCO which we just created and select “Properties”. Click the Security tab and then click “Add”:
  • Enter the CNO (Make sure to select “Computers” option in the “Object Types” window) and click “OK”. The CNO is a Cluster Name Object. This is the name of the Windows Cluster name NOT listener or FCI name.
  • Give CNO “Full Control” over the VCO.

After this, we should be able to bring VCO online in the cluster manager. Listener creation also worked and it was a happy ending.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Always On Listener Creation Failure – Enabling Object ProdListener Failed With Error 5

SQL SERVER – Always On Listener Not Coming Online – Failed to Create New NBT Interface, Status 1450

$
0
0

During the recent consulting engagement Comprehensive Database Performance Health Check with my client, one of the DBA was busy with Always On deployment. He asked me if I can help him with some pointers. Since I was able to fix it and learned something, in this blog I would share my learning about fixing error “Failed to create new NBT interface, status 1450 “in cluster log while creating the “IP Address” resource.

SQL SERVER - Always On Listener Not Coming Online - Failed to Create New NBT Interface, Status 1450 NBTinterface-800x461

The situation was that DBA was creating Always On availability group and he was creating listeners also at the same time. After watching the cluster manager while he uses the Wizard and presses the “Finish” button, we found that it was an IP Address resource that was going to a failed state. Hence, I asked to create only an availability group first and then add the listener later.

Once the availability group got created, we tried creating Client Access Point in cluster manager and as expected, IP address resource failed to come online. We generated Cluster logs to find the exact error message. SQL SERVER – Steps to Generate Windows Cluster Log? The relevant portion of the log is pasted below. I have removed some details to make it easy to read.

ERR [RES] IP Address <Cluster IP Address>: Unable to create new NBT interface, status 1450
ERR [RES] IP Address <Cluster IP Address>: Failed to create new NBT interface, status 1450. Deleting stale NBT device from cluster database
INFO [RES] <Unknown Resource>: Successfully deleted NBT interface information from database, status 0.
ERR [RHS] Online for resource Cluster IP Address failed.
WARN [RCM] HandleMonitorReply: ONLINERESOURCE for ‘Cluster IP Address’, gen(96) result 5018/0.
INFO [RCM] Res Cluster IP Address: OnlinePending -> ProcessingFailure( StateUnknown )
INFO [RCM] TransitionToState(Cluster IP Address) OnlinePending–>ProcessingFailure.

WORKAROUND/SOLUTION – NBT Interface

Line # 21 and 2 show the actual error, which is what I mentioned in the blog title. Whenever I see the NBT interface in the error message I always remember NetBIOS over TCP (NetBT). Here is the screen is taken from the internet which shows the setting. This is the properties windows on IP address resource in the cluster.

SQL SERVER - Always On Listener Not Coming Online - Failed to Create New NBT Interface, Status 1450 tcp-netbt-01

To solve this problem, we disabled “Enable NetBIOS for this address” and after that, we were able to bring IP address online and the issue was resolved.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Always On Listener Not Coming Online – Failed to Create New NBT Interface, Status 1450

SQL SERVER – LogonUserExEx Fails for User & GetToken – Logging on as the CNO Failed With Error 1326

$
0
0

SQL Always-On feature is so tied to cluster that sometimes you would need to look at various logs to fix a failure. In this blog I would share my learning about fixing error LogonUserExEx fails for user & GetToken – Logging on as the CNO failed with error 1326.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 GetToken-800x376

Let me explain the situation and the steps I took to fix the error. The issue was that the listener network name resource was not coming online in the cluster manager. When I checked the properties of the resource, I found the below messages.

On the properties window, I have highlighted these messages.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 clu-ag-err-01

DNS Status: The handle is invalid.

Kerberos: The user name or password is incorrect.

I asked my client to generate a cluster log. Here is the article which you can follow to generate the logs. SQL SERVER – Steps to Generate Windows Cluster Log?

In the cluster log, the relevant lines are below.

INFO [RES] Network Name: [NNLIB] PopulateKerbKDCLookupCache – DC flags 0
INFO [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 96, result size 0 (status: 0, subStatus: 0)
INFO [RES] Network Name: [NNLIB] Priming local KDC cache to \\dc.domain.com for domain label ad
INFO [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 82, result size 0 (status: 0, subStatus: 0)
WARN [RES] Network Name: [NNLIB] LogonUserExEx fails for user SQLAUTHCLU01$: 1326 (useSecondaryPassword: 0)
WARN [RES] Network Name: [NNLIB] LogonUserExEx fails for user SQLAUTHCLU01$: 1326 (useSecondaryPassword: 1)
INFO [RES] Network Name: [NNLIB] Logon failed for user SQLAUTHCLU01$ (Error 1326), DC \\dc.domain.com, domain domain.com
ERR [RES] Network Name: [NN] GetToken – Logging on as the CNO failed with error 1326
ERR [RES] Network Name : Dns: Obtaining token threw exception, error 6
ERR [RES] Network Name : Dns: Failed DNS registration with error 6 for Name: SQLAUTHCLU01 (Type: Singleton)

We can see two errors in above log, 1326 and 6. The meaning of those is matching with what was shown in the UI.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 clu-ag-err-02

WORKAROUND/SOLUTION – GetToken

I search at various places on the internet and found that this can be fixed by repair of the computer objects. Below is the screenshot for that.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 clu-ag-err-03

After that, I also asked them to restart the nodes and then the listener came online.

Have you seen such errors in cluster logs?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – LogonUserExEx Fails for User & GetToken – Logging on as the CNO Failed With Error 1326


SQL SERVER – Drop failed for Availability Group – Failed to Destroy the Windows Server Failover Clustering Group Corresponding to Availability Group

$
0
0

SQL SERVER - Drop failed for Availability Group - Failed to Destroy the Windows Server Failover Clustering Group Corresponding to Availability Group droperror While working with one of the clients to recover from a disaster, I encountered an error. In this blog we would talk about error Drop failed for Availability Group. If you want any such assistance, you can hire me for quick consultation using On Demand offering.

THE SITUATION

When I got a call from my client, they were into the situation where availability group was not coming online. After an investigation, we concluded that their windows cluster was having issues and it was unable to start. The solution in such disaster is to start windows cluster in force quorum mode.

Once we started to cluster it in ForceQuorum mode (also called as fix quorum) we tried deleting the AG and received below error.

Failed to destroy the Windows Server Failover Clustering group corresponding to availability group ‘AG’.  The operation encountered SQL Server error 41000 and has been terminated.  Refer to the SQL Server error log for details about this SQL Server error and corrective actions.
An error occurred while removing availability group ‘AG’. The DROP AVAILABILITY GROUP command removed the availability group configuration from the local metadata. However, the attempt to remove this configuration from the Windows Server Failover Clustering (WSFC) cluster failed because the Always On Availability Groups manager is not online (SQL Server error: 41081). To remove the availability group configuration from the WSFC cluster, re-enter the command. (Microsoft SQL Server, Error: 41081)

In above error message, we see another error number: 41000 for which the text is as follows. “Failed to obtain the local Windows Server Failover Clustering (WSFC) handle (Error code %d).  If this is a WSFC availability group, the WSFC service may not be running or may not be accessible in its current state.  Otherwise, contact your primary support provider.  For information about this error code, see “System Error Codes” in the Windows Development documentation.”

WHAT YOU SHOULD DO?

Well, no action needed. The message raised is not a dangerous error message. It clearly mentions that availability group information has been removed from SQL Server, but it might still exist in the failover cluster. So, you can click OK on message and check availability group in SSMS and it should be gone from “Always On High Availability” > “Availability Group”. At this point, the database would be in restoring state and you need to bring them online using below

ALTER DATABASE <DBName> SET ONLINE

OR

RESTORE DATABASE <DBName> WITH RECOVERY

Once things are back to normal, you need to reconfigure availability again.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Drop failed for Availability Group – Failed to Destroy the Windows Server Failover Clustering Group Corresponding to Availability Group

SQL SERVER – Always On Listener Failure – Provisioning Computer Object Failed With Error 5

$
0
0

In the past, I have written a few blogs where I explained a few issues about the failure of listener creation in the cluster. In this blog, we would see another cause of listener creation failure.

SQL SERVER - Always On Listener Failure - Provisioning Computer Object Failed With Error 5 failure-800x263

I explained that whenever there are any issues related to the cluster resources, we should always look at the cluster log. If you are not sure how to generate cluster logs, read my earlier blog on the same topic. SQL SERVER – Steps to Generate Windows Cluster Log?

  • Listener Name – ProdListener
  • Windows Cluster Name – WinCluster

Here were the messages in the cluster log

[RES] Network Name : AccountAD: PopulateNetnameADState – DCFlags 1073745937 LdapReferralAllowed 0
[RES] Network Name: [NNLIB] FindSuitableDCNew – objectName ProdListener, username – WinCluster$, firstChoiceDCName – \\DC01.domain.com
[RES] Network Name: [NNLIB] Found first choice DC that has the object ProdListener DCName – \\DC01.domain.com.

[RES] Network Name: [NNLIB] GetOUForVCOCreation – OU name of CNO is OU=IN,DC=Domain,DC=Com
[RES] Network Name: [NNLIB] GetOUForVCOCreation – CNO SID is S-1-5-21-2987268983-2468375096-37832846-293957
[RES] Network Name : AccountAD: Object ProdListener already exists, using DC: \\DC01.domain.com

[RES] Network Name: [NNLIB] Changing Password for object ProdListener on DC: \\DC01.domain.com result: 5
[RES] Network Name: [NN] IdentityLocal End Impersonating
[RES] Network Name : AccountAD: Provisioning computer object failed with error 5
[RES] Network Name : AccountAD: OnInitializeEnd: 5
[RES] Network Name : Configuration: InitializeReplyCreation of NetName (type Singleton), result: 5, IsCanceled: false

I have trimmed the cluster logs and kept only interesting messages. In my client’s case, they already had the Listener (ProdListener) provisioned in the domain controller. The most relevant error is “Changing Password for object ProdListener on DC: \\DC01.domain.com result: 5”. This essentially means that the Cluster is trying to change the password of the object and failing with access denied. The key part here is that this operation is done by “username – WinCluster$”. This is not a domain user but a cluster computer object, also known as CNO.

“Access is denied” error can be caused due to many other reasons which can cause network name or client access point resource in cluster creation to fail. Based on my experience one of the most common causes would be where the Domain Administrator does not allow the CNO “Read All Properties” and “Create Computer Objects” permissions. You might see “Access is denied” in the event log.

WORKAROUND/SOLUTION – Listener Failure

Based on the cluster log snipper, it was clear that a domain admin needs to provide access on VCO to CNO. VCO is a virtual computer object which is a listener in Always On scenario. CNO is a cluster name object which is a computer object for Windows Cluster network name in active directory.

  • Open the Active Directory Users and Computers Snap-in (dsa.msc) on a domain controller.
  • In Menu > View -> check Advanced Features. (Otherwise, we would not see option explained in next steps)
  • Right-click the OU/Container where we want the VCO (ProdListener in above example) to be created and click “New” -> “Computer”
  • Provide a name for the object (This will be Listener Name for AG) and click “OK”.
  • Right-click on the VCO which we just created and select “Properties”. Click the Security tab and then click “Add”:
  • Enter the CNO (Make sure to select “Computers” option in the “Object Types” window) and click “OK”. The CNO is a Cluster Name Object. This is the name of the Windows Cluster name NOT listener or FCI name.
  • Give CNO “Full Control” over the VCO.

After this, we should be able to bring listeners online in the cluster manager.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Always On Listener Failure – Provisioning Computer Object Failed With Error 5

SQL SERVER – Always On Listener Creation Failure – Enabling Object ProdListener Failed With Error 5

$
0
0

In the past, I have written a few blogs where I explained a few issues about the failure of listener creation Failure in a cluster. In this blog, we would see another cause of listener creation failure.

SQL SERVER - Always On Listener Creation Failure - Enabling Object ProdListener Failed With Error 5 listenercreation-800x252

Whenever I get issues while creating a listener in SQL Server, I look at the failover cluster manager to figure out the resource which is failing and then I generate a cluster log. If you are not sure how to generate cluster logs, read my earlier blog on the same topic. SQL SERVER – Steps to Generate Windows Cluster Log?

Here is the setup of my lab. This would help you in co-relating the cluster log from your environment.

  • Listener Name – ProdListener
  • Windows Cluster Name – WinCluster

As I mentioned my listener was failing, which is a client access point in the cluster manager. In the below lines from the cluster log, I have removed the timestamp to avoid clutter.  Here is the trimmed version of the cluster log from the node where the network name resource was not coming online.

INFO [RES] Network Name : AccountAD: Initializing Name: ProdListener, NetbiosName: ProdListener, Type: Singleton, Created: false
INFO [RES] Network Name : AccountAD: PopulateNetnameADState – DCFlags 1073745937 LdapReferralAllowed 0
INFO [RES] Network Name: [NNLIB] FindSuitableDCNew – objectName ProdListener, username – WinCluster$, firstChoiceDCName – \\domain.com
INFO [RES] Network Name: [NNLIB] Found first choice DC that has the object ProdListener DCName – \\domain.com.
INFO [RES] Network Name: [NNLIB] GetOUForVCOCreation – OU name of CNO is OU=IN,DC=Domain,DC=Com
INFO [RES] Network Name: [NNLIB] GetOUForVCOCreation – CNO SID is S-1-5-21-2987268983-2468375096-37832846-293957
INFO [RES] Network Name : AccountAD: Object ProdListener already exists, using DC: \\domain.com
INFO [RES] Network Name: [NNLIB] Trying to enable disabled object ProdListener on DC: \\domain.com (Flags 0000000000001022)
WARN [RES] Network Name : AccountAD: Enabling object ProdListener failed with error 5

In above cluster log, we can see INFO, WARN, ERR which is similar to what we see in windows event logs, Warning, Information and Error. The most relevant error is “Network Name <ProdListener>: AccountAD: Enabling object ProdListener failed with error 5”. This essentially means that Cluster is trying to enable the object and failing with access denied error. They key part here is that this operation is done by “username – WinCluster$”. This is not a domain user but a cluster computer object, also known as CNO.

“Access is denied” error can be caused due to many other reasons which can cause network name or client access point resource in cluster creation to fail. Based on my experience one of the most common causes would be where the Domain Administrator does not allow the CNO “Read All Properties” and “Create Computer Objects” permissions. You might see “Access is denied” in the event log.

WORKAROUND/SOLUTION – Listener Creation

Based on the cluster log snippet, it was clear that a domain admin needs to provide access on VCO to CNO. VCO is a virtual computer object which is a listener in Always On scenario. CNO is a cluster name object which is a computer object for Windows Cluster network name in active directory.

  • Open the Active Directory Users and Computers Snap-in (dsa.msc) on a domain controller.
  • In Menu > View -> check Advanced Features. (Otherwise, we would not see option explained in next steps)
  • Right-click the OU/Container where we want the VCO (ProdListener in above example) to be created and click “New” -> “Computer”
  • Provide a name for the object (This will be Listener Name for AG) and click “OK”.
  • Right-click on the VCO which we just created and select “Properties”. Click the Security tab and then click “Add”:
  • Enter the CNO (Make sure to select “Computers” option in the “Object Types” window) and click “OK”. The CNO is a Cluster Name Object. This is the name of the Windows Cluster name NOT listener or FCI name.
  • Give CNO “Full Control” over the VCO.

After this, we should be able to bring VCO online in the cluster manager. Listener creation also worked and it was a happy ending.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Always On Listener Creation Failure – Enabling Object ProdListener Failed With Error 5

SQL SERVER – Always On Listener Not Coming Online – Failed to Create New NBT Interface, Status 1450

$
0
0

During the recent consulting engagement Comprehensive Database Performance Health Check with my client, one of the DBA was busy with Always On deployment. He asked me if I can help him with some pointers. Since I was able to fix it and learned something, in this blog I would share my learning about fixing error “Failed to create new NBT interface, status 1450 “in cluster log while creating the “IP Address” resource.

SQL SERVER - Always On Listener Not Coming Online - Failed to Create New NBT Interface, Status 1450 NBTinterface-800x461

The situation was that DBA was creating Always On availability group and he was creating listeners also at the same time. After watching the cluster manager while he uses the Wizard and presses the “Finish” button, we found that it was an IP Address resource that was going to a failed state. Hence, I asked to create only an availability group first and then add the listener later.

Once the availability group got created, we tried creating Client Access Point in cluster manager and as expected, IP address resource failed to come online. We generated Cluster logs to find the exact error message. SQL SERVER – Steps to Generate Windows Cluster Log? The relevant portion of the log is pasted below. I have removed some details to make it easy to read.

ERR [RES] IP Address <Cluster IP Address>: Unable to create new NBT interface, status 1450
ERR [RES] IP Address <Cluster IP Address>: Failed to create new NBT interface, status 1450. Deleting stale NBT device from cluster database
INFO [RES] <Unknown Resource>: Successfully deleted NBT interface information from database, status 0.
ERR [RHS] Online for resource Cluster IP Address failed.
WARN [RCM] HandleMonitorReply: ONLINERESOURCE for ‘Cluster IP Address’, gen(96) result 5018/0.
INFO [RCM] Res Cluster IP Address: OnlinePending -> ProcessingFailure( StateUnknown )
INFO [RCM] TransitionToState(Cluster IP Address) OnlinePending–>ProcessingFailure.

WORKAROUND/SOLUTION – NBT Interface

Line # 21 and 2 show the actual error, which is what I mentioned in the blog title. Whenever I see the NBT interface in the error message I always remember NetBIOS over TCP (NetBT). Here is the screen is taken from the internet which shows the setting. This is the properties windows on IP address resource in the cluster.

SQL SERVER - Always On Listener Not Coming Online - Failed to Create New NBT Interface, Status 1450 tcp-netbt-01

To solve this problem, we disabled “Enable NetBIOS for this address” and after that, we were able to bring IP address online and the issue was resolved.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Always On Listener Not Coming Online – Failed to Create New NBT Interface, Status 1450

SQL SERVER – LogonUserExEx Fails for User & GetToken – Logging on as the CNO Failed With Error 1326

$
0
0

SQL Always-On feature is so tied to cluster that sometimes you would need to look at various logs to fix a failure. In this blog I would share my learning about fixing error LogonUserExEx fails for user & GetToken – Logging on as the CNO failed with error 1326.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 GetToken-800x376

Let me explain the situation and the steps I took to fix the error. The issue was that the listener network name resource was not coming online in the cluster manager. When I checked the properties of the resource, I found the below messages.

On the properties window, I have highlighted these messages.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 clu-ag-err-01

DNS Status: The handle is invalid.

Kerberos: The user name or password is incorrect.

I asked my client to generate a cluster log. Here is the article which you can follow to generate the logs. SQL SERVER – Steps to Generate Windows Cluster Log?

In the cluster log, the relevant lines are below.

INFO [RES] Network Name: [NNLIB] PopulateKerbKDCLookupCache – DC flags 0
INFO [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 96, result size 0 (status: 0, subStatus: 0)
INFO [RES] Network Name: [NNLIB] Priming local KDC cache to \\dc.domain.com for domain label ad
INFO [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 82, result size 0 (status: 0, subStatus: 0)
WARN [RES] Network Name: [NNLIB] LogonUserExEx fails for user SQLAUTHCLU01$: 1326 (useSecondaryPassword: 0)
WARN [RES] Network Name: [NNLIB] LogonUserExEx fails for user SQLAUTHCLU01$: 1326 (useSecondaryPassword: 1)
INFO [RES] Network Name: [NNLIB] Logon failed for user SQLAUTHCLU01$ (Error 1326), DC \\dc.domain.com, domain domain.com
ERR [RES] Network Name: [NN] GetToken – Logging on as the CNO failed with error 1326
ERR [RES] Network Name : Dns: Obtaining token threw exception, error 6
ERR [RES] Network Name : Dns: Failed DNS registration with error 6 for Name: SQLAUTHCLU01 (Type: Singleton)

We can see two errors in above log, 1326 and 6. The meaning of those is matching with what was shown in the UI.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 clu-ag-err-02

WORKAROUND/SOLUTION – GetToken

I search at various places on the internet and found that this can be fixed by repair of the computer objects. Below is the screenshot for that.

SQL SERVER - LogonUserExEx Fails for User & GetToken - Logging on as the CNO Failed With Error 1326 clu-ag-err-03

After that, I also asked them to restart the nodes and then the listener came online.

Have you seen such errors in cluster logs?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – LogonUserExEx Fails for User & GetToken – Logging on as the CNO Failed With Error 1326

Viewing all 53 articles
Browse latest View live


Latest Images