11 Replies Latest reply on Oct 19, 2019 5:23 PM by Nik

    Boot process discover variable memory size

    user13524501

      Hi, I have an X4450 and at each reboot it discover different memory size; no apparent errors or led are signified; I am a the latest firmware version

       

      Version 3.0.6.15.f r101655

       

      PropertyValue
      SP Firmware Version3.0.6.15.f
      SP Firmware Build Number101655
      SP Firmware DateFri Aug 14 14:15:22 CST 2015
      SP Filesystem Version0.1.22

       

      the server got 128G populated with 32X4G. Sometime it offers 40G, 64G, 72G. I got 128G only one or two time

      I resit all the stick and the memoy riser board, I test the memory with pc-check; what else can I do ? is there something to do to fix that

       

      Thanks !

      Michel Jean

        • 1. Re: Boot process discover variable memory size
          ClaudiuO-Oracle

          Hello user13524501,

           

          Do you see any memory errors during POST or that is the amount of memory which is being initialized and presented to the OS (sometimes 40G, 64G, 72G etc)?

           

          Mechanically the first thing to do is check that the memory mezzanine card is firmly locked into place. We have seen the arms not secured by the green levers, and making sure they are locked down,has resolved this issue. If that is not the case you proceed to isolate bad dimms by populating the system to minimum configuration and test the DIMM in pairs - Install the first DIMM pair in slots A0/B0. Install the second pair in slots C0/D0 - (a lot of work there...)

           

          Out of curiosity are all the DIMM's the same manufacturer (part number)?

           

          A good thing is also to visually inspect the slots for any signs of contamination (dust, burnt, bent pins) which can cause such intermittent behavior.

           

          Best regards,

          Claudiu

          • 2. Re: Boot process discover variable memory size
            Nik

            Hi.

            You can check status off all DIMMS via ILOM.

            Check what see ILOM and what realy installed.

             

            Are You see correct memory size at POST output?

             

            Regards,

              Nik

            • 3. Re: Boot process discover variable memory size
              ClaudiuO-Oracle

              Hello Nik,

               

              This is what Michael stated:

               

              "the server got 128G populated with 32X4G. Sometime it offers 40G, 64G, 72G. I got 128G only one or two time"

               

              If POST disables part of memory during memory initialization then of course the OS will pick up the quantity of memory that passed POST.

               

              Best regards,
              Claudiu

              • 4. Re: Boot process discover variable memory size
                user13524501

                Hi, I double check last evening and form what I can saw all the dimm are from the same manufacturer (Samsung )with the same sun fru part (371-3069-01). I also check the levers and that look good. I also populate the dimm one by one from last summer and that let me have 128G but since that time I reboot and lost the numbers. I also inspect the socket and dimm connectors and that look good. Perhaps I have to blow it with compressed air . Like expressed the graphic console report a certain ram number and the os will work with. As now I did not got the serial console connected; should I ? is there more message from the serial console ?

                 

                Thank !

                • 5. Re: Boot process discover variable memory size
                  ClaudiuO-Oracle

                  Hello ,

                   

                  Thanks for the details.

                   

                  The part number is good, when I meant pins I also meant the pins from the DIMM's as they can also have contamination signs (burnt pins, dust etc...) are they ok as well?

                   

                  Blowing compressed air on the sockets is a good maintenance to carry from time to time, especially if the server is not located in a clean data center/environment.

                   

                  Try rebooting the system let's say 4-5 times in a row and let me know the amount of memory initialized and visible to the ILOM/OS.

                   

                  Out of curiosity what OS do you have installed on this server? Have you experienced any panic/crash/BSOD/PSOD depending on the OS?

                   

                  The serial console will display the same thing as the graphical console when it comes to POST, no difference there.

                   

                  You can also check the -> show /SP/logs/event/list/ output from ILOM and browse trough it for any memory errors during system initialization/reboots.

                   

                  Best regards,

                  Claudiu

                  • 6. Re: Boot process discover variable memory size
                    ClaudiuO-Oracle

                    Hello ,

                     

                    Have you managed to do any progress with the memory issue?

                     

                    Best regards,

                    Claudiu

                    • 7. Re: Boot process discover variable memory size
                      ClaudiuO-Oracle

                      Hello ,

                       

                      Have you managed to do any progress with the memory issue?

                       

                      Best regards,

                      Claudiu

                      • 8. Re: Boot process discover variable memory size
                        user13524501

                        Hi, Tanks for the follow up and sorry about the delay. I do what was asked and I re-test each pair of dimm  by booting the server for each new pair added. By doing this I found three pair of defective dimm pairs. (that left my server with 104G of ram). The week after I work on the server to reinstall the OS and manage to get vm server working. Many reboot was involved until I l got another deffective pair of dimm. So now I am running with 96G ( minus 4 pairs of dimm ) for a week now. I hope this was the last pair to replace.

                         

                        Thanks again

                        Michel

                        • 9. Re: Boot process discover variable memory size
                          user13524501

                          I have installed oracle vm server relase 3.4.6

                          kernel 4.1.12-124.21.1.el6uek.x86_64

                           

                          and to answer the other question I got that kind of message from "show /SP/logs/event/list/"

                           

                          9585   Wed Sep 18 19:58:14 2019  Chassis   Action    major

                                 Hot removal of /SYS/MB/MCH/DD7

                          9584   Wed Sep 18 19:58:14 2019  Chassis   Action    major

                                 Hot removal of /SYS/MB/MCH/DD6

                          9583   Wed Sep 18 19:58:14 2019  Chassis   Action    major

                                 Hot removal of /SYS/MB/MCH/DC7

                           

                          but nothing about memory error

                           

                          I got a panic but it came at boot time when dimm have failled; the OS  boot in loop and since I got a limited console view and I can't keep it.   

                          • 10. Re: Boot process discover variable memory size
                            ClaudiuO-Oracle

                            Hello

                             

                             

                             

                            9585   Wed Sep 18 19:58:14 2019  Chassis   Action    major

                                   Hot removal of /SYS/MB/MCH/DD7

                            9584   Wed Sep 18 19:58:14 2019  Chassis   Action    major

                                   Hot removal of /SYS/MB/MCH/DD6

                            9583   Wed Sep 18 19:58:14 2019  Chassis   Action    major

                                   Hot removal of /SYS/MB/MCH/DC7

                             

                            I hope that the power cords were removed during the removal/ insertion of the DIMM's isn't it?

                             

                            Is the server stable now with 96 Gb of memory?

                             

                            Best regards,
                            Claudiu

                            • 11. Re: Boot process discover variable memory size
                              Nik

                              Hi.

                              Devices /SYS/MB/MCH/D* is memory DIMM.

                              So system detect  hot removal of memory DIMM.

                               

                              It can be dust, faulted DIMM or socket problem.

                               

                              Regards,

                                Nik