我正在编写压力测试套件,用于通过NFS测试分布式文件系统.

在某些情况下,当某个进程删除文件,而另一些进程尝试从中读取文件时,出现“陈旧文件句柄”错误(116).

在这种加薪条件下,这种错误是可以预期的并且可以接受的吗?

测试工作如下:


起始x客户端计算机数量
每台客户端计算机运行y个进程
每个进程都可以执行任何文件操作,如stat / read / delete / open
提及的文件操作是标准的python方法-os.stat / read / os.remove / open
所有文件均为空0字节数据

文件存在,如成功的stat操作所示:

controller\_debug.log.2:2016-10-26 15:02:30,156;INFO –
[LG-E27A-LNX:0xa]: finished 640522b4d94c453ea545cb86568320ca, result:
success | stat |
/JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41
| data: {} | 2016/10/26 15:02:30.156

客户端CLIENT-A上的进程0x1已成功删除:

controller\_debug.log.2:2016-10-26 15:02:30,164;INFO –
[CLIENT-A:0x1]: finished 5f5dfe6a06de495f851745a78857eec1, result:
success | delete |
/JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41
| data: {} | 2016/10/26 15:02:30.161

3毫秒后,由于“陈旧文件句柄”,客户端CLIENT-B上的进程0xb失败“读取”操作

controller\_debug.log.2:2016-10-26 15:02:30,164;INFO –
[CLIENT-B:0xb]: finished e84e2064ead042099310af1bd44821c0, result:
failed | read |
/mnt/DIRSPLIT-node0.b27-1/JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41
| [errno:116] | Stale file handle | 142 | data: {} | 2016/10/26
15:02:30.160 controller\_debug.log.2:2016-10-26 15:02:30,164;ERROR –
Operation read FAILED UNEXPECTEDLY on File
JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41
due to Stale file handle

谢谢

解决方法:

这是完全可以预期的.在删除对象(无论是文件还是目录)之后,NFS规范对文件句柄的使用很明确. Section 4显然解决了这个问题.例如:

The persistent filehandle will become stale or invalid when the file system object is removed. When the server is presented with a persistent filehandle that refers to a deleted object, it MUST return an error of NFS4ERR\_STALE.

这是一个非常普遍的问题,它甚至在NFS FAQ的A.10部分中都有其自己的条目,该条目指出ESTALE错误的一个常见原因是:

The file handle refers to a deleted file. After a file is deleted on the server, clients don’t find out until they try to access the file with a file handle they had cached from a previous LOOKUP. Using rsync or mv to replace a file while it is in use on another client is a common scenario that results in an ESTALE error.

预期的解决方案是您的客户端应用程序必须关闭并重新打开文件以查看发生了什么.或者,如常见问题解答所述:

… to recover from an ESTALE error, an application must close the file or directory where the error occurred, and reopen it so the NFS client can resolve the pathname again and retrieve the new file handle.

标签: linux, python, nfs

相关文章推荐

添加新评论,含*的栏目为必填