5

My ASP.NET and SQL Server 2012 application is running on Windows Server 2008 R2. Suddenly, my internet on server stopped working and my application start throwing,

An operation on a socket could not be performed because the system lacked sufficient buffer 
space or because a queue was full

Running netstat showing that PID = 0 opening a lot of ports. Netstat saying that,

Process Id = 0, State = TIME_WAIT have 130,053 ports open
Process Id = 38840, State = CLOSE_WAIT have 5 ports open
Process Id = Any, State = LISTENING have 30 ports open
Process Id = Any, State = ESTABLISHED have 10 ports open

Stats 22 Dec 2015,

CLOSE_WAIT  5   
ESTABLISHED 146
TIME_WAIT   646750
LAST_ACK    1
LISTENING   30
  • Questions: (1) How much RAM? (2) Does a reboot fix it? (3) Are you running a torrent downloader? (4) What are the results of [sfc /scannow](http://www.sevenforums.com/tutorials/1538-sfc-scannow-command-system-file-checker.html)? (5) Have you done antivirus full scans (in addition to your antivirus I recommend Malwarebytes Anti-Malware)? – harrymc Jan 22 '15 at 15:10
  • (6) Check if the value of [TcpTimedWaitDelay](https://technet.microsoft.com/en-us/library/cc938217.aspx) needs reducing. (7) Do you have any product installed that receives or initiates lots of TCP connection such as a Web server? – harrymc Jan 22 '15 at 15:25
  • @harrymc 1) RAM is 8 GB but nearly 5 GB consumed by database. 2) Yes I think but this is production site I can't do it 3) No 4) I am running Windows server 5) No(I am using Rackspace servers whihc assume maintain this) – Imran Qadir Baksh - Baloch Jan 22 '15 at 19:22
  • @harrymc 6) I have set this to 30 sec 7) Yes I am using ASP.NET here which means IIS web server. Also some cron jobs on server initiates HTTP(means TCP) connections. – Imran Qadir Baksh - Baloch Jan 22 '15 at 19:24
  • More: (1) If you have set TcpTimedWaitDelay and rebooted, did this help? (2) Is the web server heavily loaded with requests (post some statistics)? (3) Is it serving localhost or network requests? (4) Could you post the Machine.config file or at least its connectionManagement part (see [this article](http://stackoverflow.com/questions/7849884/what-is-limiting-the-of-simultaneous-connections-my-asp-net-application-can-ma))? – harrymc Jan 22 '15 at 20:42
  • 1) No haven't reboot 2) yup nearly 10-50 req/s 3) no 4) its default never changed – Imran Qadir Baksh - Baloch Jan 23 '15 at 17:20
  • 1) Boot required. 2) That explains it - TcpTimedWaitDelay may reduce the problem. 4) Waiting. – harrymc Jan 23 '15 at 17:26
  • @harrymc, Thanks but what is the risk for restring the server machine. – Imran Qadir Baksh - Baloch Jan 23 '15 at 18:47
  • In my company we have changed TcpTimedWaitDelay and rebooted more than one server with no ill effects. If you are really worried, see [Bare Metal Restore](http://blogs.technet.com/b/askcore/archive/2011/05/12/bare-metal-restore.aspx) to reduce downtime (never used or needed it since these servers were VMs). – harrymc Jan 23 '15 at 19:49
  • One more question: Which browser are your users using, since it seems as if it doesn't efficiently close its connections to your web server. – harrymc Jan 24 '15 at 10:55
  • Any from mobile devices to computers – Imran Qadir Baksh - Baloch Jan 24 '15 at 11:17
  • What `netstat` command gave you that report? – Saeed Neamati Apr 13 '16 at 06:29

2 Answers2

7

You are running a web server that is accessed by browsers from multiple mobile devices.

Due to the way TCP/IP works, connections can not be closed immediately. Packets may arrive out of order or be retransmitted after the connection has been closed. CLOSE_WAIT indicates that the remote endpoint (other side of the connection) has closed the connection. TIME_WAIT indicates that local endpoint (this side) has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately. The connections will be removed when they time out within default period of four minutes.

Nevertheless, the number next to your TIME_WAIT statistic, 646750, is extremely excessive. It means that 646750 connections were closed in the last 4 minutes, which makes 2694 per second! Evidently, some of these mobile devices are heavily malfunctioning and are bombarding your server with connections that are not being properly closed from the client side, or that you are serving an enormous number of clients (which makes no sense for a single server).

If you are unable to isolate which mobile devices or application are at the cause of the problem and to fix them, you don't control the client side and can only alleviate the problem on the server side.

One parameter that can improve this congestion is TcpTimedWaitDelay, described as:

Determines the time that must elapse before TCP can release a closed connection and reuse its resources. This interval between closure and release is known as the TIME_WAIT state or 2MSL state. During this time, the connection can be reopened at much less cost to the client and server than establishing a new connection.

Reducing the value of this entry allows TCP to release closed connections faster, providing more resources for new connections. However, if the value is too low, TCP might release connection resources before the connection is complete, requiring the server to use additional resources to reestablish the connection.

TcpTimedWaitDelay can be modified by regedit at HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters. It contains the number of seconds to wait. The default is 240 seconds (4 minutes). Reboot is required if changed.

For example, changing to 30 seconds and with 2694 connections per second will mean that only 80820 connections will be waiting for close. This number is still enormous, but the change will still reduce the usage of connection resources.

harrymc
  • 455,459
  • 31
  • 526
  • 924
-1

Same question here : https://serverfault.com/questions/661476/getting-an-operation-on-a-socket-could-not-be-performed-because-the-system-lack/

Its a windows max connexion problem some kb say change the max ephemeral port or add memory :/

http://blogs.msdn.com/b/sql_protocols/archive/2009/03/09/understanding-the-error-an-operation-on-a-socket-could-not-be-performed-because-the-system-lacked-sufficient-buffer-space-or-because-a-queue-was-full.aspx

I saw this problem on physical server with very big uptime (8+ month) a reboot resolved the problem ...

YuKYuK
  • 99
  • 1
  • 1
    The max ephemeral port is already fixed on Server 2008 R2. – harrymc Jan 22 '15 at 15:07
  • And what's the config on your ephemeral port ? Can you check with a script like this one for %max used port http://blogs.msdn.com/b/debuggingtoolbox/archive/2010/10/11/powershell-script-troubleshooting-for-port-exhaustion-using-netstat.aspx – YuKYuK Jan 22 '15 at 15:45
  • See [Ephemeral Port Limits](http://blogs.msdn.com/b/drnick/archive/2008/09/19/ephemeral-port-limits.aspx). – harrymc Jan 22 '15 at 15:48
  • Your netstats show :`Process Id = 0, State = TIME_WAIT have 130,053 ports open ` 130k connections is a bit high check this process he kill your connectivity . – YuKYuK Jan 22 '15 at 16:26
  • Yep I know but people saying TIME_WAIT is not an issue, it will auto reset – Imran Qadir Baksh - Baloch Jan 22 '15 at 19:19