根据反馈的sds日志中未发现硬件报错,OS下PMC Raid卡的驱动版本较老,需升级到最新;出现内存报错的机台情况为:当时机器在重启初始化阶段,内存在初始化所以出现报错;

9.26新收集的sds日志有问题,还请重新收集一下;

建议开启kdump收集异常重启日志信息,开启方法如下:

1. 确定kdump相关package已安装好


[root@server01 ~]# rpm -qa | grep kdump

system-config-kdump-2.0.5-18.el6.noarch

[root@server01 ~]# rpm -qa | grep kexec

kexec-tools-2.0.0-286.el6.x86\_64

2. 修改启动参数,在内核行末尾添加红色部分

vim /etc/grub.conf

kernel /vmlinuz-2.6.32-573.el6.x86\_64 …rhgb quiet crashkernel=128M

\# 注意:上面是一行

3. 修改/etc/kdump.conf,配置dump文件保存在何处,默认保存在/var/crash下,可以不用修改,但要保证/var/crash下有足够磁盘空间

4. 设定kdump服务开机启动

chkconfig kdump on

5. 重启服务器使配置生效:reboot

6. 验证是否生效

128M内存不被正常的系统使用,为捕获内核保留,free -m的输出会显示内存比不加参数时少了128M;

[root@server01 ~]# service kdump status

Kdump is operational

7.进入/boot文件夹,删除initrd-2.6.32-573.el6.x86\_64kdump.img,执行service kdump restart;

因为kdump文件之前使用的为老版本的驱动,所以在更新完驱动后,应重新生成initrd-2.6.32-573.el6.x86\_64kdump.img文件来确保使用的为新版本驱动;

有什么进展吗?

有个局点,20多台服务器中,4台发生了重启,硬件日志没有什么报错。用户监控日志有内存报错,重启,agent无响应等问题,请帮忙看一下,以下是4台机器的简单分析 :

日志存放 :

ftp://00744532:[email protected]

问题一 :9-19日,10:59 监控系统发现无响应

![本地图片,请重新上传]()

Serial Number: 210200A00JN17A001976

SDS:11:15服务器启动(手动启动)

71

Informational

1

0

0

2018-09-20 03:10:57

2018-09-19 19:10:57

EventType: Button / Switch, Event: Reset Button pressed, Data2: 1, Data3: 3

72

Informational

1

0

0

2018-09-20 03:11:29

2018-09-19 19:11:29

EventType: System Event, Event: Timestamp Clock Synch, Data2: 0

73

Informational

1

0

0

2018-09-19 19:15:24

2018-09-19 11:15:24

EventType: System Event, Event: Timestamp Clock Synch, Data2: 128

Sosreport:系统宕机期间没有任何记录

Sep 19 10:01:28 datanode08 sz[44045]: [root] solrconfig.xml\_model/ZMODEM: 73126 Bytes, 29542 BPS

Sep 19 19:17:51 datanode08 kernel: imklog 5.8.10, log source = /proc/kmsg started.

Sep 19 19:17:51 datanode08 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5326" x-info="http://www.rsyslog.com"] start

Sep 19 19:17:51 datanode08 kernel: Initializing cgroup subsys cpuset

问题二:重启,监控报内存错误

Serial Number: 210200A00JN187000420

![本地图片,请重新上传]()

SDS:无硬件报错,但有重启记录

271

Informational

1

0

0

2018-09-19 18:02:21

2018-09-19 10:02:21

EventType: System Event, Event: Timestamp Clock Synch, Data2: 0

272

Informational

1

0

0

2018-09-19 18:02:45

2018-09-19 10:02:45

EventType: System Event, Event: Timestamp Clock Synch, Data2: 128

273

Informational

1

0

0

2018-09-19 18:04:01

2018-09-19 10:04:01

EventType: OEM, Event: Adapter is ok., Data2: 255

274

Informational

1

0

0

2018-09-19 18:04:02

2018-09-19 10:04:02

EventType: OEM, Event: Green Backup subsystem of adapter is ok., Data2: 255

275

Informational

1

0

0

2018-09-19 19:58:55

2018-09-19 11:58:55

EventType: System Event, Event: Timestamp Clock Synch, Data2: 0

276

Informational

1

0

0

2018-09-19 19:58:58

2018-09-19 11:58:58

EventType: System Event, Event: Timestamp Clock Synch, Data2: 128

277

Informational

1

0

0

2018-09-19 20:00:07

2018-09-19 12:00:07

EventType: OEM, Event: Adapter is ok., Data2: 255

278

Informational

1

0

0

2018-09-19 20:00:08

2018-09-19 12:00:08

EventType: OEM, Event: Green Backup subsystem of adapter is ok., Data2: 255

Sosreport :有重启记录时间基本吻合,没有看到MCE报错

Sep 19 03:14:16 datanode12 rhsmd: In order for Subscription Manager to provide your system with updates, your system must be registered with the Customer Portal. Please enter your Red Hat login to ensure your system is up-to-date.

Sep 19 06:40:02 datanode12 auditd[5151]: Audit daemon rotating log files

Sep 19 18:05:14 datanode12 kernel: imklog 5.8.10, log source = /proc/kmsg started.

Sep 19 18:05:14 datanode12 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5271" x-info="http://www.rsyslog.com"] start

Sep 19 18:05:14 datanode12 kernel: Initializing cgroup subsys cpuset

Sep 19 11:41:08 datanode12 ntpd[6161]: 0.0.0.0 0615 05 clock\_sync

Sep 19 11:41:09 datanode12 ntpd[6161]: 0.0.0.0 c618 08 no\_sys\_peer

Sep 19 20:01:34 datanode12 kernel: imklog 5.8.10, log source = /proc/kmsg started.

Sep 19 20:01:34 datanode12 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5209" x-info="http://www.rsyslog.com"] start

Sep 19 20:01:34 datanode12 kernel: Initializing cgroup subsys cpuset

问题三:重启,内存报错

Serial Number: 210200A00JN187000424

![本地图片,请重新上传]()

SDS:服务器发生重启,但没看到内存报错

Informational

1

0

0

2018-09-19 11:02:03

2018-09-19 03:02:03

EventType: System Event, Event: Timestamp Clock Synch, Data2: 0

Informational

1

0

0

2018-09-19 11:02:23

2018-09-19 03:02:23

EventType: System Event, Event: Timestamp Clock Synch, Data2: 128

Informational

1

0

0

2018-09-19 11:03:30

2018-09-19 03:03:30

EventType: OEM, Event: Adapter is ok., Data2: 255

Informational

1

0

0

2018-09-19 11:03:32

2018-09-19 03:03:32

EventType: OEM, Event: Green Backup subsystem of adapter is ok., Data2: 255

Sosreport : 服务器重启

Sep 19 03:30:01 datanode09 auditd[5158]: Audit daemon rotating log files

Sep 19 11:04:54 datanode09 kernel: imklog 5.8.10, log source = /proc/kmsg started.

Sep 19 11:04:54 datanode09 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5200" x-info="http://www.rsyslog.com"] start

Sep 19 11:04:54 datanode09 kernel: Initializing cgroup subsys cpuset

Sep 19 11:04:54 datanode09 kernel: Initializing cgroup subsys cpu

问题四:监控agent无响应,ICMP不可达,重启

![本地图片,请重新上传]()

Chassis Serial Number=210235A1Y7N176000003

SDS:无硬件报错信息记录

Sosreport :系统发生重启

Sep 19 14:32:39 datanode05 sz[10421]: [taskctl] data.txt/ZMODEM: 14447460 Bytes, 1615222 BPS

Sep 19 23:56:22 datanode05 kernel: imklog 5.8.10, log source = /proc/kmsg started.

Sep 19 23:56:22 datanode05 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="5243" x-info="http://www.rsyslog.com"] start

Sep 19 23:56:22 datanode05 kernel: Initializing cgroup subsys cpuset

![本地图片,请重新上传]()

标签: linux, kdump, Sep, Event

相关文章推荐

添加新评论,含*的栏目为必填