Skip to main content

Vcenter disconnects troubleshooting

Synopsys:

1. Vmware might not support more than 100 connections per vmk port.
2. It is good to seperate vmotion into a seperate vmk nic  and vlan than being with management port. ( make sure vmks are in different subnet)
3. Firewall in ESXi might not be able to handle the traffic
4. Check the ethtool for any network drops. ( checking duplicate IPs always helps)
5. NFS datastore has more than 20,000 files. (ls -lR |wc -l)
6. ESXi resources (cpu/memory - there might be other performance issues)
7. Disable HA/DRS to see if disconnects disappear. (related to the datastores with
a lot of files)
8. vpxa/vpxd crashes.

Details: 

1. How to check the number of connections per vmk ?

Quick script:


~ # esxcli network ip connection list | sort -k 4| awk '{print $4}'|cut -d ":" -f 1| uniq -c
 1 ------------------
14 0.0.0.0
394 10.X.Y.150   <<<< all were stuck at FIN_WAIT_2
40 127.0.0.1
7 192.168.5.1
1 Send
~ # esxcli network ip connection list | sort -k 4

2. Seperate the vmotion and management vmknic/vlan

3. Firewall settings:
~ # esxcli network firewall get
   Default Action: PASS
   Enabled: false 
   Loaded: true ( it should be loaded for HA to work)

4.  This will determine any errors or packet drops - due to switch /cable/port

     "ethtool -S vmnic0 |grep error| grep -v :\ 0"
     "ethtool -S vmnic0 |grep rx_no_buffer_count | grep -v :\ 0"
     "ethtool -S vmnic0 |grep drop"

5. Duplicate IP address - run arp commands from different node, when pinging ESXi.
or ssh to ESXi and see if keeps disconnecting.

 esxcli  network ip neighbor list -- see if any MAC entries are flapping .
arp -a (linux)



6. Make sure esxcfg-vmknic -l ( all the ips don't have duplicate ip/conflict/multiple vmks in same subnet) 
7. Finally review vpxa.log/vmkernel.log(APD) and hostd.log on ESXi as well as vpxd.log in vcenter. - vmkernel logs for vpxa crash

8. Fix when it is hung is to restart services ( services.sh restart)

PR 831801: The default value of FIN_WAIT_2 timer was erroneosly set to TCPTV_KEEPINTVL * TCP_KEEPCNT = 75* 0x400. This discrepancy results in the socket at FIN_WAIT_2 state to exist for a much longer time and if multiple such sockets are accumulated, they might impact new socket creation.

10. Disable firewall on ESXi  ( use these rules exactly, so HA will work)
 # esxcli network firewall get
   Default Action: DROP or PASS
   Enabled: false
   Loaded: true

12. Increase VPXA thread.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2009217

13. NFS datastore has more than 20,000 files - vpxa could hang.

14. If there is memory contention, make sure ESXi has 2G of memory reserved.
 

Comments

Popular posts from this blog

Vmware view Sysprep customization steps

VMware View Desktop Error 'The Display Protocol for this Desktop is currently blocked by a firewall'

user profile conflict in c drive and d drive or user profile is not creating in d drive

VMware View Display Protocol Error