We have 2 Exchange 2013 mailbox servers running in an active active setup, 50% of the mail databases on each server.
this morning the databases on server 2 failed over to server 1, the event logs reported these errors
File share witness resource 'File Share Witness '(\\witness server\domain.com\DAG\domain.com)' failed to arbitrate for the file share '(\\witness server\domain.com\DAG\domain.com)'. Please ensure
that file share '(\\witness server\domain.com\DAG\domain.com)' exists and is accessible by the cluster
The witness does not report any issues, the share it mentions exists, is online and the SAN has no errors recorded. Other errors are:
Cluster resource 'File Share Witness '(\\witness server\domain.com\DAG\domain.com)' of type 'File Share Witness' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is
causing the issue
The initiator could not send an iSCSI PDU. Error status is given in the dump data.
Connection to the target was lost. The initiator will attempt to retry the connection.
and
The IO operation at logical block address 0x634ae8 for Disk 18 (PDO name: \Device\0000004a) was retried.
There are no errors on the SAN, none of the disks have failed, the witness has no problems and Exchange is reporting the passive copies of the databases are healthy, there is no CQL. I had the network guys analyze the network logs for the connections all
the servers involved are on and the SAM, no entries to indicate issues, peak traffic was less than 1% of available bandwidth at the time of the issue.
Both Exchange servers and the witness are VMs (VMware), they are configured to always be on different hosts. There was no unusual traffic to or from the mail servers and the witness is only used for this and KMS so there is no huge overhead on this either.
Anybody seen this before or have any idea what could have caused it?