This discussion is archived
1 2 Previous Next 26 Replies Latest reply: Oct 25, 2012 5:57 AM by Klaas-Jan Jongsma RSS

Exadata Backup job is taking very long time

767881 Newbie
Currently Being Moderated
Hi folks,

We are using Oracle Exadata V2 storage software. Our Database size is 5 TB. We are using HP Data Protector as MML for tape backups.

We run weekly full database backup and daily incremental backups (with no compression) but the full backup is taking more than 20 hours to finish the backup. As per Oracle Exadata Best Prectices Document this should not take more than 3-4 hours (max).

Could anyone have idea why its taking so long time, or you are also facing the same issue?

Below is the complete script to run the backup job:

BARLIST "Oracle_VIVADW_Backup"
OWNER "root" "viva.com.kw" "dw0101-vip.viva.com.kw"
DYNAMIC 1 5
DEVICE "HP:Ultrium 4-SCSI_1_production1"
{
-sync
-pool "Oracle DataBase Backup"
-prealloc
"0100007f:4cbac0db:7da9:0009"
"0100007f:4cbac149:7da9:000a"
"0100007f:4cbac1b6:7da9:000b"
"0100007f:4cbac23f:7da9:000c"
"0100007f:4cbac2af:7da9:000d"
"0100007f:4cbac31e:7da9:000e"
"0100007f:4cbac38a:7da9:000f"
"0100007f:4cbac3fe:7da9:0010"
"0100007f:4cbac46e:7da9:0011"
"0100007f:4cbac4dd:7da9:0012"
"0100007f:4cbac549:7da9:0013"
"0100007f:4cbac5bd:7da9:0014"
"0100007f:4cbac62d:7da9:0015"
"0100007f:4cbac69c:7da9:0016"
"0100007f:4cbac709:7da9:0017"
}

CLIENT "vivadw" dw0101-vip.viva.com.kw
{
-exec ob2rman.exe
-args {
"-backup"
}
-input {
"run {"
"allocate channel 'dev_0' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"allocate channel 'dev_1' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"allocate channel 'dev_2' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"allocate channel 'dev_3' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"allocate channel 'dev_4' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"allocate channel 'dev_5' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"allocate channel 'dev_6' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"allocate channel 'dev_7' type 'sbt_tape'"
" parms 'ENV=(OB2BARTYPE=Oracle8,OB2APPNAME=vivadw,OB2BARLIST=Oracle_VIVADW_Backup)';"
"sql 'alter system archive log current';"
"backup incremental level <incr_level>"
" format 'Oracle_VIVADW_Backup<vivadw_%s:%t:%p>.dbf'"
" database;"
"backup"
" format 'Oracle_VIVADW_Backup<vivadw_%s:%t:%p>.dbf'"
" archivelog all;"
"backup"
" format 'Oracle_VIVADW_Backup<vivadw_%s:%t:%p>.dbf'"
" recovery area;"
"backup"
" format 'Oracle_VIVADW_Backup<vivadw_%s:%t:%p>.dbf'"
" current controlfile;"
"}"
}
-public
} -protect weeks 1

Please suggest.

Regards.
  • 1. Re: Exadata Backup job is taking very long time
    Daryl E. Explorer
    Currently Being Moderated
    I was coming here to post essentially the same thing. I don't think its an Exadata specific problem .. but backing up 20-80Tb is not as easy as the MAA docs report it to be. I don't see 1-2Tb per hour or anywhere near that number.
    Sure I am not using the infiniband to talk directly to the media library, but what are the realistic numbers to backup this beast?
    Do we tweak these little hidden parameters like backupksfq_bufcnt - probably not - but they are inviting :-)

    Probably best to get the spreadsheets out and calculate the theoretical maximums and perhaps do some bottleneck tuning to find out where things slow down.

    (We use veritas netbackup - we got burned on a bigfile that was 32Tb in size -- never could get it to tape)


    Daryl
  • 2. Re: Exadata Backup job is taking very long time
    Daryl E. Explorer
    Currently Being Moderated
    Not certain if this is relevant or not, but curious if my values are representative or totally crud compared to others...

    select min(round(effective_bytes_per_second*60*60/1024/1024/1024,0)) MIN_GBperH,
    round(avg(round(effective_bytes_per_second*60*60/1024/1024/1024,0))) AVG_GBperH,
    max(round(effective_bytes_per_second*60*60/1024/1024/1024,0)) Max_GBperH
    from v$backup_async_io;


    MIN_GBPERH AVG_GBPERH MAX_GBPERH
    7 56 207


    No where near the Tb/ hour figures ... (streaming to tape -- 8 threads, 1 from each db node)
  • 3. Re: Exadata Backup job is taking very long time
    mrmessin Oracle ACE
    Currently Being Moderated
    While I have not seen 2 TB an hour I have a near 6TB and backup times running in the 8 hour range mostly a little more then 8 and sometimes has hit 9 hours. Using Netbackup, Back up Process not using the Netbackup scheduler, and process runs form one of the database nodes. The backup goes directly to tape using 6 channels. The 6 channels match the 6 physical tape drives that exist. Then the tape drives and netbackup server are connected via the infiniband. No RMAN backup compression or encryption is being used.

    By the way daily incremental is running in around 2 hours each night.

    Edited by: mrmessin on Dec 19, 2010 6:16 PM
  • 4. Re: Exadata Backup job is taking very long time
    Daryl E. Explorer
    Currently Being Moderated
    Seems like the key is infinband right to the backup system. Kind of hard when you have multiple exadatas and the distance limitations of infiniband. 8hrs for 8Tb still isnt great when you have ~100Tb capacity in exadata (plus archives being created just as quickly :-( )

    Tossing around the idea of a dedcated backup system just for exadata..
  • 5. Re: Exadata Backup job is taking very long time
    mrmessin Oracle ACE
    Currently Being Moderated
    Back up of archivelogs are every few hours directly to tape, this backup system is not dedicated to Exadata. Limiations in the backup for the implemenation appears to be the tape system and not exadata, it appears that we could add tape drives and additonal channels to the backup and improve the speed as well as better performing tape drives that would also improve the backup speed. Network measurements did not show that the infiniband was anywhere near its capacity and can add many additional channels without affecting the infiniband connections. The infiniband is key to good backup times to tape library systems as is the speed and capacity of the drives,.
  • 6. Re: Exadata Backup job is taking very long time
    603349 Explorer
    Currently Being Moderated
    Have you guys read through these documents?

    http://www.oracle.com/technetwork/database/features/availability/backup-restore-1-131218.pdf
    http://www.oracle.com/technetwork/database/features/availability/maa-tech-wp-sundbm-backup-final-129256.pdf

    Are your media servers connected to the IB fabric or are you pulling data over the 1GbE interface of the db nodes?

    --
    Regards,
    Greg Rahn
    http://structureddata.org
  • 7. Re: Exadata Backup job is taking very long time
    Daryl E. Explorer
    Currently Being Moderated
    Yes .. thanks .. but the docs are a little short on specifics and evidence. As I noted above .. not using IB. Love to be able to point to a bottleneck .. and say "here is the evidence for doing the backups this way". THe reason we post the question is because we cant attain the numbers in the doc :-(
  • 8. Re: Exadata Backup job is taking very long time
    603349 Explorer
    Currently Being Moderated
    Daryl E. wrote:
    Yes .. thanks .. but the docs are a little short on specifics and evidence. As I noted above .. not using IB. Love to be able to point to a bottleneck .. and say "here is the evidence for doing the backups this way". THe reason we post the question is because we cant attain the numbers in the doc :-(
    I'm slightly confused -- exactly what evidence are you looking for? I think they give pretty good descriptions of the topology and cite numbers/rates where it is important.

    Certainly if you are not using IB as in the cited documents, you won't be able to achieve those speeds. So the next question is how fast do you expect to backup given your configuration (and what is your configuration & topology). What is the math behind your estimates?

    --
    Regards,
    Greg Rahn
    http://structureddata.org
  • 9. Re: Exadata Backup job is taking very long time
    Spyros Kaparelis Newbie
    Currently Being Moderated
    I really don't know how to speed up Exadata backup in practice.

    But in a DWH 12TB with 2 fiber channels (4Gb each) and NetBackup we accomplished after too much performance tuning in the system 5:30hr backup time. Approximately 2.2TB per hour. The 2 FC are dedicated only for the backup process. We use 24 channels for the backup and VTL.

    The infiniband is much faster, but i don't know how you have to attach it in the library

    Edited by: Spyros Kaparelis on 20 Δεκ 2010 5:23 μμ

    Edited by: Spyros Kaparelis on 22 Δεκ 2010 7:33 πμ
  • 10. Re: Exadata Backup job is taking very long time
    Daryl E. Explorer
    Currently Being Moderated
    From the doc, in the Gigabit Enternet section "throughtput as high as 960MB/sec is possible".. thats 3.46TB/hr. Oh I wish I could get even half of that.. but subsequently, a lovely graph of CPU load is shown in the whitepaper. Show me a graph that the throughput of 960MB/sec is attainable.
    How to triage our configs to get better throughput is really the point of the thread - not to bash a whitepaper.
  • 11. Re: Exadata Backup job is taking very long time
    Daryl E. Explorer
    Currently Being Moderated
    I am looking at OEM Grid Control - the eth3 interface (our dedicated interface). I am seeing it top out at 2.5MB/s on one node roughly 1MB/s for each of the others. Am I looking in the right spot? Should that be the place I should see somether areound 120MB/s? (heck even 60MB would be real nice :-) )
  • 12. Re: Exadata Backup job is taking very long time
    603349 Explorer
    Currently Being Moderated
    That's absolutely correct -
    8 nodes * 1GbE port = 8 * 120MB/s = 960MB/s = 3,456,000 MB/hour = 3.46TB/hour
    Now understand, 120MB/s per port is basically 1GbE port/wire speed -- that leaves nothing for any other traffic on that port. I dont think you need a graph, that math is easy.

    There are a few data flow exchange rates that we need to understand:
    - how fast data can flow out of the Exadata Database Machine
    - how fast data can flow in to the media server
    - how fast data can flow out of the media server (to tape)
    - how fast data can flow in/out of any switches between any 2 points

    I'd suggest grabbing a sheet of paper and drawing your topology out and marking down the physical max/wire speeds at the different data exchange points -- it may become very obvious why you are getting the rates you are. The data flow will be gated by the smallest data exchange rate in the topology.

    For example, if you are using the 8 1GbE ports, then you need 8 1GbE ports on your media server (or equivalent faster networking) to keep up. Similarly you need enough tape drives to write out 960MB/s.


    --
    Regards,
    Greg Rahn
    http://structureddata.org
  • 13. Re: Exadata Backup job is taking very long time
    Daryl E. Explorer
    Currently Being Moderated
    For example, if you are using the 8 1GbE ports, then you need 8 1GbE ports on your media server (or equivalent faster networking) to keep up. Similarly you need enough tape drives to write out 960MB/s.

    They have multiple 10GbE ports on the media servers .. and a lot of tape drives .. 50+.
    Thanks for replying .. still looking ..

    Thinking of trying a simple copy out of the box to the media server .. to test bandwidth and such between.
  • 14. Re: Exadata Backup job is taking very long time
    603349 Explorer
    Currently Being Moderated
    I'd suggest to start with testing the network bandwidth between the 8 db nodes and the media server using iperf. Start with 1 db node and scale up to 8.
    http://sourceforge.net/projects/iperf/
    http://www.go2linux.org/how-to-mesure-network-performance-iperf

    Another tool is netperf:
    http://www.netperf.org/netperf/

    --
    Regards,
    Greg Rahn
    http://structureddata.org
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points