SLOW: Paid Subscription

I have NCM pro. Even with 20 CPU and think time set to 1 seconds, stock fish 10 is usually taking 3-5 seconds to find the move. How can I figure it to give me quicker results?

This is a very good question, and a quick investigation turned up some interesting findings and at least a partial fix!

TL;DR: We recently increased the hash table size from 1024 to 8192, and that added a second to each calculation. We just now reverted that so calculation times are 1 second shorter. We’ll keep searching to find ways to get results returned quicker!

Long and technical version:

Here’s a rough overview of what happens when you hit calculate in your browser or app:

  1. Your browser makes a request to NCM which goes to NCM’s main load balancer in Seattle, WA.
  2. The request lands on one of our redundant frontend servers.
  3. The frontend looks for an idle backend and uses curl to make a request to it
  4. The backend spits out an unbuffered HTTP response which mostly corresponds to the engine’s line-by-line output; this turns into the real-time stats you see while the engine is calculating.

So, I tried manually making some 1 second requests to the free servers using curl from a frontend. Here’s the results:

$ time ./free-vps.sh > /dev/null
real  0m1.210s
user  0m0.012s
sys 0m0.000s

$ time ./free-vps.sh > /dev/null
real  0m1.188s
user  0m0.001s
sys 0m0.008s

$ time ./free-vps.sh > /dev/null
real  0m1.234s
user  0m0.007s
sys 0m0.003s

Then I tried doing the same thing but for the 20-CPU core servers:

$ time ./20-cpu-core.sh > /dev/null
real  0m2.769s
user  0m0.006s
sys 0m0.005s

$ time ./20-cpu-core.sh > /dev/null
real  0m2.751s
user  0m0.010s
sys 0m0.000s

$ time ./20-cpu-core.sh > /dev/null
real  0m2.970s
user  0m0.005s
sys 0m0.004s

So I’m seeing what you’re referencing. 1.2s for calculations on the free servers, yet nearly 3 seconds per calculation on the 20-CPU core servers.

Next I dug in to the 20-core servers and noticed a long delay between when we send the engine the “go movetime 1000” command to tell it to start processing and when the output comes back:

$ ./20-cpu-core.sh | ts -s "%.s" | grep -A 1 "go movetime"
0.706905 "{\"kind\":\"input\",\"payload\":\"go movetime 1000\"}"
1.538852 "{\"kind\":\"output\",\"payload\":\"info depth 1 seldepth 1 multipv 1 score cp 200 nodes 1308 nps 654000 tbhits 0 time 2 pv c7c6\"}"

So there’s nearly an 800ms delay between the “go movetime” command and the next line of output. But not so with the VPS servers; the output follows almost immediately:

$ ./free-vps.sh | ts -s "%.s" | grep -A 1 "go movetime"
0.423066 "{\"kind\":\"input\",\"payload\":\"go movetime 1000\"}"
0.423087 "{\"kind\":\"output\",\"payload\":\"info depth 1 seldepth 1 multipv 1 score cp 200 nodes 84 nps 84000 tbhits 0 time 1 pv c7c6\"}"

The culprit turned out to be the hash size, which we recently changed to 8192 for the 20-core servers, but left at the default of 16 for the VPS servers.

So, I’ve restored the hash size to 1024, which is what it used to be until September 5th of this year. With that change, the 830ms is reduced to around 330ms:

$ ./20-cpu-core.sh | ts -s "%.s" | grep -A 1 "go movetime"
0.773614 "{\"kind\":\"input\",\"payload\":\"go movetime 1000\"}"
1.105229 "{\"kind\":\"output\",\"payload\":\"info depth 1 seldepth 1 multipv 1 score cp 200 nodes 891 nps 445500 tbhits 0 time 2 pv c7c6\"}"

And also after the change, I’m seeing the overall calculation times reduced:

$ time ./20-cpu-core.sh > /dev/null
real  0m1.737s
user  0m0.000s
sys 0m0.010s

$ time ./20-cpu-core.sh > /dev/null
real  0m1.716s
user  0m0.006s
sys 0m0.004s

$ time ./20-cpu-core.sh > /dev/null
real  0m1.772s
user  0m0.005s
sys 0m0.004s
chess@ncm-frontend-1:~/test$

I’m a little confused as to why the calculation times dropped a full second rather than just the 500ms measured earlier, but I’ll figure that out later.

2 Likes

Thank you so much for the prompt reply to this matter.