2

Thanks for reading this question (I looked everywhere to find an existing answer without success...) The question is in the title but here are more details:

Here is my setup:

  • I'm working with a S2600WT server from Intel with two CPUs (Xeon v4 with 24 cores per CPU) running on Ubuntu 16.04 LTS.
  • Cores [00..23] are belonging to CPU#1
  • Cores [24..47] are belonging to CPU#2
  • I have 2 Intel SSDs: DC3700 connected to the PCIe of the CPU#1 (PCIe x4-gen3 each through NVME)
  • I have an acquisition board (Innovative X6-1000M) on the other PCIe slot of the CPU#1 (PCIe x8-gen2)

We try to run an acquisition software from our company that captures data from the acquisition board and write them to the SSDs.

  1. When I have only one CPU, the program is working as expected, and we are able to capture at up to 2.4GB/s
  2. When I have two CPUs, we have a software deadlock and the program freezes. When I use the debugger I can see that the program is frozen when it tries to write in the SSDs.

I would thus try to have two CPUs but launch the software only on the CPU#1, and to route all the interrupts directly to CPU#1.

To do that, I wrote the file /proc/irq/defaut_smp_affinty to 0000,00000004. This works for the acquisition board, I can see that all the interrupts are directly forwarded to the core#2 (on the CPU#1)

However, I see that the interrupts of my NVMe are still dispatched on all the cores, and also on the cores of CPU#2 (cores 24..47).

I tried to do:

cat /proc/interrupts | grep -i nvme

to see the interrupts number belonging to the NVMe SSD, and then to manually modify the file (irq [157 to 186])

sudo -i echo 0000,00000004 > /proc/irq/186/smp_affinity
permission denied

I tried also:

sudo sh -c "echo 0000,00000004 > /proc/irq/186/smp_affinity"
sh: echo I/O error

sudo sh -c "echo 1,3 > /proc/irq/186/smp_affinity_list"
sh: echo I/O error

I can't manage to modify these files.

I tried also to relaunch the service irqbalance in that way:

export IRQBALANCE_BANNED_CPUS=ffff,fffffff0
sudo service irqbalance stop
sudo irqbalance --debug

I also tried numactl to launch the program on CPU#1 CORE#2

But nothing worked so far, I see that some NVMe interrupts are still received by CPU#2. Do you have any idea, how I could manage to dispatch the interrupts of the NVMes only to CPU#1 and exclude totally CPU#2 from my software?

Thanks a lot for the answer, I hope that I've been enough explicit for you to help me :) otherwise, just ask for more details.

EDIT_1: I managed to have my software working by disabling all the cores of the CPU#2 using the command:

for i in {24..47} do; sudo sh -c "echo 0 > /sys/devices/cpu/cpu$i/online"

The problem is, by doing that, I just deactivate every CORES of the CPU#2, and I just reduce my computation power by two. But this gives me more clues that running the software on one CPU only works.

0 Answers0