Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.4K Intelligent Advisor
- 75 Insurance
- 537.7K On-Premises Infrastructure
- 138.7K Analytics Software
- 38.6K Application Development Software
- 6.1K Cloud Platform
- 109.6K Database Software
- 17.6K Enterprise Manager
- 8.8K Hardware
- 71.3K Infrastructure Software
- 105.4K Integration
- 41.6K Security Software
Boot process discover variable memory size

Hi, I have an X4450 and at each reboot it discover different memory size; no apparent errors or led are signified; I am a the latest firmware version
Version 3.0.6.15.f r101655
Property | Value |
---|---|
SP Firmware Version | 3.0.6.15.f |
SP Firmware Build Number | 101655 |
SP Firmware Date | Fri Aug 14 14:15:22 CST 2015 |
SP Filesystem Version | 0.1.22 |
the server got 128G populated with 32X4G. Sometime it offers 40G, 64G, 72G. I got 128G only one or two time
I resit all the stick and the memoy riser board, I test the memory with pc-check; what else can I do ? is there something to do to fix that
Thanks !
Michel Jean
Answers
-
Hello user13524501,
Do you see any memory errors during POST or that is the amount of memory which is being initialized and presented to the OS (sometimes 40G, 64G, 72G etc)?
Mechanically the first thing to do is check that the memory mezzanine card is firmly locked into place. We have seen the arms not secured by the green levers, and making sure they are locked down,has resolved this issue. If that is not the case you proceed to isolate bad dimms by populating the system to minimum configuration and test the DIMM in pairs - Install the first DIMM pair in slots A0/B0. Install the second pair in slots C0/D0 - (a lot of work there...)
Out of curiosity are all the DIMM's the same manufacturer (part number)?
A good thing is also to visually inspect the slots for any signs of contamination (dust, burnt, bent pins) which can cause such intermittent behavior.
Best regards,
Claudiu
-
Hi.
You can check status off all DIMMS via ILOM.
Check what see ILOM and what realy installed.
Are You see correct memory size at POST output?
Regards,
Nik
-
Hello Nik,
This is what Michael stated:
"the server got 128G populated with 32X4G. Sometime it offers 40G, 64G, 72G. I got 128G only one or two time"
If POST disables part of memory during memory initialization then of course the OS will pick up the quantity of memory that passed POST.
Best regards,
Claudiu -
Hi, I double check last evening and form what I can saw all the dimm are from the same manufacturer (Samsung )with the same sun fru part (371-3069-01). I also check the levers and that look good. I also populate the dimm one by one from last summer and that let me have 128G but since that time I reboot and lost the numbers. I also inspect the socket and dimm connectors and that look good. Perhaps I have to blow it with compressed air . Like expressed the graphic console report a certain ram number and the os will work with. As now I did not got the serial console connected; should I ? is there more message from the serial console ?
Thank !
-
Hello user13524501,
Thanks for the details.
The part number is good, when I meant pins I also meant the pins from the DIMM's as they can also have contamination signs (burnt pins, dust etc...) are they ok as well?
Blowing compressed air on the sockets is a good maintenance to carry from time to time, especially if the server is not located in a clean data center/environment.
Try rebooting the system let's say 4-5 times in a row and let me know the amount of memory initialized and visible to the ILOM/OS.
Out of curiosity what OS do you have installed on this server? Have you experienced any panic/crash/BSOD/PSOD depending on the OS?
The serial console will display the same thing as the graphical console when it comes to POST, no difference there.
You can also check the -> show /SP/logs/event/list/ output from ILOM and browse trough it for any memory errors during system initialization/reboots.
Best regards,
Claudiu
-
-
-
Hi, Tanks for the follow up and sorry about the delay. I do what was asked and I re-test each pair of dimm by booting the server for each new pair added. By doing this I found three pair of defective dimm pairs. (that left my server with 104G of ram). The week after I work on the server to reinstall the OS and manage to get vm server working. Many reboot was involved until I l got another deffective pair of dimm. So now I am running with 96G ( minus 4 pairs of dimm ) for a week now. I hope this was the last pair to replace.
Thanks again
Michel
-
I have installed oracle vm server relase 3.4.6
kernel 4.1.12-124.21.1.el6uek.x86_64
and to answer the other question I got that kind of message from "show /SP/logs/event/list/"
9585 Wed Sep 18 19:58:14 2019 Chassis Action major
Hot removal of /SYS/MB/MCH/DD7
9584 Wed Sep 18 19:58:14 2019 Chassis Action major
Hot removal of /SYS/MB/MCH/DD6
9583 Wed Sep 18 19:58:14 2019 Chassis Action major
Hot removal of /SYS/MB/MCH/DC7
but nothing about memory error
I got a panic but it came at boot time when dimm have failled; the OS boot in loop and since I got a limited console view and I can't keep it.
-
Hello user13524501,
Thanks for the follow-up, did you find any signs of contamination/dust in the DIMM sockets?
For the following lines in event list:
9585 Wed Sep 18 19:58:14 2019 Chassis Action major
Hot removal of /SYS/MB/MCH/DD7
9584 Wed Sep 18 19:58:14 2019 Chassis Action major
Hot removal of /SYS/MB/MCH/DD6
9583 Wed Sep 18 19:58:14 2019 Chassis Action major
Hot removal of /SYS/MB/MCH/DC7
I hope that the power cords were removed during the removal/ insertion of the DIMM's isn't it?
Is the server stable now with 96 Gb of memory?
Best regards,
Claudiu