This discussion is archived
5 Replies Latest reply: Aug 28, 2013 11:36 AM by EdStevens RSS

network issue, inconsistent file download results,

EdStevens Guru
Currently Being Moderated

Oracle Linux 5.6 64-bit, running under VirtualBox on my Win 7 Pro deskop

 

This is a 'lab' machine, replicating one of my live servers.  For my lab purposes, it is configured with 2 NICs

#1 configured hostonly, assigned to eth0 and configured with a fixed ip in the same subnet as the VBox adapter running on the host os.

#2 configured for NAT, assigned to eth1 which was then configured for DHCP.  I have really not used anything that would address eth1 in several months, until today.when I began testing application of an Oracle Grid Infrastructure update and letting OUI retrieve and install any applicable updates.

 

The connection test using my MOS credentials passes.

The app (OUI) is able to retrieve from MOS a list of applicable patches.

When actually trying to download the patches, it returns an error, and the details state that there is "a network connection problem."   No information more specific than that.

 

I ran two other tests to exercise the NAT adapter in retrieving files.

THe first test was a 'yum install firefox'.  It completed with no problems.

The second was a wget to retrieve a patch.  The wget was in a shell script provided by MOS as a download option for the selected patch.  On several tries all of the exchange of credentials passed, but the actual wget of the patch file would hang without completing.  Each time (I tried this several times, each time increasing what I was watching on the side) there would be a partial download of the file before things seemed to stall out, and it was at a different point each time.

 

The fact that the yum install downloaded all of its files (eleven in all) without a problem, and the wget is able to get anything at all indicates the net adapter is at least fundamentally working ... no configuration issues that would have it totally inoperable.

 

The only other observation is this.  In order to 'watch' the download activity, I had a second session where I set up a loop to watch the activity on eth1 and the file being downloaded:

 

while :

do

ifconfig eth1

ls -ltr /download_dir | tail -5

sleep 2

done

 

Doing this I could see the TX and RX numbers on eth1 increasing and the size of the downloaded file increasing, until such time as things stalled out.  What was particularly interesting was the fact that the loop did not run at a constant speed.  Sometimes the 'sleep 2' would end up more like 5 or 6.  I'm sure that has to do with the way VBox is handling the internal clock timing but don't know if that could be related to the download problem.

  • 1. Re: network issue, inconsistent file download results,
    Dude! Guru
    Currently Being Moderated

    The internal clock on a PC is interrupt driven and affected when hardware interrupts are stalled or unavailable. So if your host system freezes, so will the system clock.

     

    With the original RHEL5 kernel version 2.6.18 and earlier it was a general practice to use "divider=10" in /etc/grub.conf to reduce the CPU overhead of processing timer interrupts by a factor 10. It is however not necessary with the the Oracle UEK and UEK2 kernels as they are tickless and do not trigger the usual 1000 interrupts per second. In the tickless kernel, timer interrupts are performed on demand rather than at a predetermined frequency.

     

    The VirtualBox Guest Additions ensure that the guest's system time is synchronized with the host time. http://www.virtualbox.org/manual/ch09.html#changetimesync

     

    My guess is that your problem is rather a busy host system that is not be able to handle the system load at certain times and thereby slowing down the virtual guest machine.

     

    Btw, are you using the virtio-net adapter/driver in Virtualbox? If you select this, then VirtualBox does not virtualize common networking hardware. http://www.virtualbox.org/manual/ch06.html

  • 2. Re: network issue, inconsistent file download results,
    EdStevens Guru
    Currently Being Moderated

    Dude wrote:

     

    The internal clock on a PC is interrupt driven and affected when hardware interrupts are stalled or unavailable. So if your host system freezes, so will the system clock.

     

    With the original RHEL5 kernel version 2.6.18 and earlier it was a general practice to use "divider=10" in /etc/grub.conf to reduce the CPU overhead of processing timer interrupts by a factor 10. It is however not necessary with the the Oracle UEK and UEK2 kernels as they are tickless and do not trigger the usual 1000 interrupts per second. In the tickless kernel, timer interrupts are performed on demand rather than at a predetermined frequency.

     

    The VirtualBox Guest Additions ensure that the guest's system time is synchronized with the host time. http://www.virtualbox.org/manual/ch09.html#changetimesync

     

    My guess is that your problem is rather a busy host system that is not be able to handle the system load at certain times and thereby slowing down the virtual guest machine.

     

    Btw, are you using the virtio-net adapter/driver in Virtualbox? If you select this, then VirtualBox does not virtualize common networking hardware. http://www.virtualbox.org/manual/ch06.html

    When I saw your comment about divider=10, I got hopeful, as I was using that without really understanding what it did.  It was a case of 'here, use this' when I first started moving from VM Workstation to VBox.  But alas, no joy after removing it and rebooting the vm.

     

    On this round of testing (having removed the divider parm from grub.conf) I noticed a few other items ..

    1 - when I edited grub.conf, I noticed an additional parm had been added after the 'divider' --  numa=off.  I didn't put that there, so I assumed it was done by the installation of VBox Guest Additions.  A quick google search seems to point to the oracle_validated package instead.

     

    2 - on my wget test, if I just run the wget.sh as is, it completes in a few seconds, but both the wget log and the downloaded file show zero length.  On the other hand, if I add a 'set -x' to the wget.sh, the behavior is as I described in my op .. script hangs at actual download and it gets a part of the file to be downloaded.  I eventually punch out and the wget log file is still at zero length.

  • 3. Re: network issue, inconsistent file download results,
    EdStevens Guru
    Currently Being Moderated

    Ok, this is really getting maddening!  I can't establish any pattern of consistency at all.  Sometimes the wget script runs to completion in a couple of seconds, but the "download" zip file has zero length.  Other times the script seems to hang on the step of downloading the file.  Normally after a few minutes I give up and kill it, but one time I waited longer and when I finally punched out the zip file was much larger, though still incomplete.  So I set up another test, an just let it run.  Took 17 minutes but finally completed and had a good download.  The divider parm in /etc/grub.conf seems to make no difference.  I've run a 'one second failure' and a 'twenty minute success' back-to-back.  I also tried changing the network adapter from the default of "Intel PRO/1000 MT Desktop" to "virtio-net" with no difference in results.

     

    One thing that did catch my attention this morning was the creation of the log file.  If you look at a wget.sh script provided by MOS for downloading a patch, you'll see this line:

     

    # Log directory and file

    LOGDIR=.

    LOGFILE=$LOGDIR/wgetlog-`date +%m-%d-%y-%H:%M`.log

     

    To which I added this:

     

    echo =====================================================

    echo LOGFILE is $LOGFILE

    echo time is `date`

    echo =====================================================

     

    Then the wget command itself redirects stdout and errout to $LOGFILE

     

    So with my added 'echo' commands, when I run the script I see this:

    =====================================================

    LOGFILE is ./wgetlog-08-27-13-12:06.log

    time is Tue Aug 27 12:06:55 CDT 2013

    =====================================================

     

    But when the script completes, I see this:

     

    =====================================================

    -rw-r--r-- 1 oracle oinstall 746 Aug 27 12:23 /tmp/3652.cookies

    drwxrwxrwx 1 root root        0 Aug  7 10:33 asmscripts

    drwxrwxrwx 1 root root     8192 Aug 21 14:05 media

    -rwxrwxrwx 1 root root     3229 Aug 27 11:01 wget.sh

    -rwxrwxrwx 1 root root 32996451 Aug 27  2013 p6880880_112000_Linux-x86-64.zip

    -rwxrwxrwx 1 root root        0 Aug 27  2013 wgetlog-08-27-13-12

    finished at Tue Aug 27 12:23:03 CDT 2013

     

    The current directory is a mounted share from the Win7 host machine. In that directory, everything is owned by root:root and accessed at 777.

    If I move wget.sh to /home/oracle, the file name is not truncated, and the file actually gets written to.

     

    None of these problems exhibit when the same script is executed from one of my real (physical) servers.

    (edit - the log file truncation is not even an issue on the real server.  I mean the download delays/failures/inconsistencies do not exhibit on the real server)

     

    I think I'll go drool in my porridge now ....

  • 4. Re: network issue, inconsistent file download results,
    Dude! Guru
    Currently Being Moderated

    I think there is not enough information to be able to pinpoint the problem. Perhaps it would help to create an easier setup to troubleshoot the issue. For instance, you could try to download some other large file using an ftp client from another site to compare the result, or save the file to a VBox disk image or local filesystem rather than using VBox shared folder. You could also try to copy or download a file from your LAN to see if the problem persists.

  • 5. Re: network issue, inconsistent file download results,
    EdStevens Guru
    Currently Being Moderated

    Dude wrote:

     

    I think there is not enough information to be able to pinpoint the problem. Perhaps it would help to create an easier setup to troubleshoot the issue. For instance, you could try to download some other large file using an ftp client from another site to compare the result, or save the file to a VBox disk image or local filesystem rather than using VBox shared folder. You could also try to copy or download a file from your LAN to see if the problem persists.

     

    Was able to ftp the same file from one of my physical servers, in 0.67 seconds.  Of course that is not having to traverse as much network as getting it from Oracle's servers.

     

    Am able to get files via yum from Oracle's public yum server with no problems, though sometimes on large loads I've seen some delays that I've always assumed were due to internal VBox issues.  A little patience always wins out on those.

     

    Performing the same wget with the current directory being on the VBox disk image shows the same behavior -- it initially gets a chunk of the file (as evidenced by an 'ls' on the directory while wget is running), then 'hangs' for 15 to 20 minutes, then quickly ( a few seconds) gets the rest and completes.  The only difference between downloading to a shared folder vs. the disk image is that odd behavior with the log file.

     

    And just to keep the real goal in mind, it is not simply to get this or any other file onto this particular vm.  I was really wanting to work with the OUI automatically downloading and installing updates at the time it makes the base install.  That process errored out with an unspecified network error.  So just to prove out the basic viability of the NAT adapter and have something I might could get a better handle on, I though wget might be a good idea.  That is based on an assumption that when OUI gets the updates it is using the same protocols as wget -- but I could easily be wrong. 

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points