1

I have 4 moosefs chunkservers with 8 2.7 TB storage on each, I mount them using mfsmount and then export with nfs. Recently I noticed high iowait on my chunkservers, then huge mount size drops, see the graphs. enter image description here

Edik Mkoyan
  • 405
  • 1
  • 5
  • 24
  • which version of MooseFS do you use? Why do you re-share MFS via NFS? Do you **really** need it? Your charts are very compressed, it is hard to see anything in legend. Could you post a bigger image? – prk Aug 20 '15 at 10:15

2 Answers2

1

About JBOD moosefs.com website says as follows (in "Best practices"): (https://moosefs.com/documentation/best-practices.html#jbod)

JBOD and XFS for Chunkservers

We recommend to connect to Chunkserver(s) JBODs. Just format the drive as XFS and mount on e.g. /mnt/chunk01, /mnt/chunk02, ... and put these paths into /etc/mfs/mfschunkserver.cfg. That's all.

We recommend such configuration mainly because of two reasons:

MooseFS has a mechanism of checking if the hard disk is in a good condition or not. MooseFS can discover broken disks, replicate the data and mark such disks as damaged. The situation is different with RAID: MooseFS algorithms do not work with RAIDs, therefore corrupted RAID arrays may be falsely reported as healthy/ok.

The other aspect is time of replication. Let's assume you have goal set to 2 for the whole MooseFS instance. If one 2 TiB drive breaks, the replication (from another copy) will last about 40-60 minutes. If one big RAID (e.g. 36 TiB) becomes corrupted, replication can last even for 12-18 hours. Until the replication process is finished, some of your data is in danger, because you have only one valid copy. If another disk or RAID fails during that time, some of your data may be irrevocably lost. So the longer replication period puts your data in greater danger.

prk
  • 21
  • 3
1

The reason of this problem was the RAM on mfs server, it started to use 50% of swap, when I increased the RAM, everything started to work as expected, but still high iowait exists on chunk server, I guess that we should move from jbod to some kind of raid.

Edik Mkoyan
  • 405
  • 1
  • 5
  • 24