For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!
Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.
Bobby,
This is dredging up some old memories - but your comment about 33M rows in uet$ is significant.
The query being run by smon is its attempt to find an extent belonging to a segment because it's dropping that segment - and it's dropping the extents one by one. If you've managed to create some very large segments in the past it's going to take a very long time to tidy up the mess, and while it's happening all sorts of other space management events are going to run into problems. If you bounce the database smon will just start off doing the same again within a couple of minutes of startup - or possibly straight away,
There is an event you can set that will stop smon clearing temporary segments - 10061 I think - this may let you work around some of your problems, but you'll then have problems with space that will never go away - and you'll have to get on to Oracle. It may be an acceptable short-term fix, though - the database will be working, even if it's growing when it shouldn't be.
(On one occasion I know I deleted about 400,000 rows from uet$ and fixed up all the associated rows in seg$, fet$ and somewhere else - but I can't recall the details)
Regards
Jonathan Lewis
Jonathan,
Thanks for your reply. We did bounce the database and as you said it didn't help. I think the final result of our support ticket will be something like what you describe but it is good to get some input outside of support.
- Bobby
Hi Bobby,
As you are dealing with Temp tablespaces, SMON cleanup of segments can cause issues as it will not service sort segment requests.
As Jonathan has already pointed out setting event 10061, you can set it and this will disable SMON from cleaning up temp extents. This
will leave used (unavailable) space in the tablespace in which the extents reside. You can then point users to another temporary tablespace if necessary. The database can then be restarted off-peak without 10061 and SMON will clean up the temporary extents.
A shutdown abort will succeed, but as soon as the database has been started again, SMON will carry on where it left off. Drop the tablespace in which the extents reside. This can be done after disabling SMON with 10061. The user process performing the drop will do the cleanup. However, this appears to be no quicker than SMON.
Just for your information, you can also check if it is performing a coalescing of free extents by running "select count(*) from dba_free_space;" a couple of time in 5-10 minute intervals. If you observe the count dropping , most likely SMON is coalescing as well.
You can disable this with the help of 10269 event. If SMON has started to coalesce, and you wish to disable it, you will have to set 10269 in the parameter file and restart the database.
Regards,
Suntrupth
Suntrupth,
Thanks for your reply. We are getting some similar feedback from Oracle support. My on-call coworker is working the sev 1 ticket now. Interesting thing is that I was doing counts on uet$ and fet$ and not seeing change. If smon is freeing extents it is taking its time on each one.
Here are uet$ and fet$ counts about 13 hours apart:
10:42
select count(*) from SYS.UET$;
COUNT(*)---------- 35881366
Elapsed: 00:00:41.03
select count(*) from SYS.FET$;
COUNT(*)---------- 934720
23:15:20
COUNT(*)---------- 35881367
So, UET$ went up by one row and FET$ stayed the same between 10:42 am and 11:15 pm CST.
Gathering an errorstack and a 10046 on the SMON session performing the cleanup should provide more details.
SMON does this - every 5 mins looks at FET$ to find any free extents to coalesce and around 2 hours checks the seg$ to see if it can cleanup any temporaty segments.
This should show how many items need cleaning:
select count(*) from user$ u, obj$ o where u.user#(+)=o.owner# and o.type#=10 and not exists (select p_obj# from dependency$ where p_obj# = o.obj#) ;
Best bet would be to set the events, and have them cleaned later.
You could try a 10046 trace at level 12 for a few minutes to see how many times smon executes that query in the time, whether the input bind values are changing, and whether smon finds any rows that match the inputs. It may be that your uet$ has become corrupted and left smon running an infinite loop as a consequence. It may be that smon is simply taking a huge amount of time on each extent because is has to read so many cluster blocks each time it runs the query.
Here is the output of the query you posted:
select count(*) 2 from sys.user$ u, sys.obj$ o 3 where u.user#(+)=o.owner# and 4 o.type#=10 and 5 not exists 6 (select p_obj# from sys.dependency$ 7 where p_obj# = o.obj#) ;
COUNT(*)---------- 0
Thanks again for your reply.
Here is part of the 10046 trace. It looks like this is the tablespace that we had trouble recreating the global index in. We had dropped a number of older partitions on the base table and couldn't get the unusable index rebuilt in its dictionary managed tablespace. According to a coworker of mine the space from the tablespace in question - about 500gig - has never been freed so that makes sense. Oracle support made reference to a bug related to failed index builds so it may be related.
Here is the output:
*** 2014-01-22 02:24:47.928
*** SESSION ID:(8.1) 2014-01-22 02:24:47.905
=====================
PARSING IN CURSOR #2 len=94 dep=1 uid=0 oct=3 lid=0 tim=1598955320 hv=1839874543 ad='34c149b0'
select file#,block#,length from uet$ where ts#=:1 and segfile#=:2 and segblock#=:3 and ext#=:4
END OF STMT
PARSE #2:c=0,e=2,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=0,tim=1598955320
BINDS #2:
bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
bfp=80000001000e9bc8 bln=22 avl=02 flg=05
value=74
bind 1: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
bfp=80000001000e9b98 bln=24 avl=03 flg=05
value=865
bind 2: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
bfp=80000001000e9b68 bln=24 avl=04 flg=05
value=261032
bind 3: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=1 size=24 offset=0
bfp=80000001000e9b38 bln=24 avl=04 flg=05
value=176590
EXEC #2:c=0,e=1,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1598955321
FETCH #2:c=269,e=267,p=0,cr=52299,cu=0,mis=0,r=0,dep=1,og=4,tim=1598955588
STAT #2 id=1 cnt=0 pid=0 pos=0 obj=13 op='TABLE ACCESS CLUSTER UET$ '
STAT #2 id=2 cnt=2 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN '
PARSING IN CURSOR #2 len=94 dep=1 uid=0 oct=3 lid=0 tim=1598955588 hv=1839874543 ad='34c149b0'
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1598955588
bind 0: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=8000000100000001 size=24 offset=0
bind 1: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=8000000100000001 size=24 offset=0
bind 2: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=8000000100000001 size=24 offset=0
bind 3: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=08 oacfl2=8000000100000001 size=24 offset=0
value=176589
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1598955588
FETCH #2:c=269,e=269,p=0,cr=52299,cu=0,mis=0,r=0,dep=1,og=4,tim=1598955857
PARSING IN CURSOR #2 len=94 dep=1 uid=0 oct=3 lid=0 tim=1598955857 hv=1839874543 ad='34c149b0'
PARSE #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1598955857
value=176588
EXEC #2:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1598955857
FETCH #2:c=270,e=270,p=0,cr=52299,cu=0,mis=0,r=0,dep=1,og=4,tim=1598956127
I think this is the problematic temporary segment that SMON is stuggling to clean up:
select 2 segment_name, 3 segment_type, 4 bytes/(1024*1024*1024) gigabytes 5 from dba_segments 6 where 7 tablespace_name='NOTTHEREALONE' and 8 header_file=865;
SEGMENT_NAME SEGMENT_TYPE GIGABYTES--------------------------------------------------------------------------------- ------------------ ----------865.261032 TEMPORARY 521.414948
The tablespace name I really used corresponded to ts#=74 in v$tablespace.
We are thinking of rebuilding the other indexes in this tablespace in a new locally managed tablespace and then dropping the existing dictionary managed tablespace. My only question is whether some corruption will cause the drop tablespace to fail.
Smon is deleting the extents in reverse order - since this is 8.1 the ela is centiseconds, so you've got 176,588 extents to go at 2.7 seconds per extent; total 4.77M seconds or fifty five days.
Check this with Oracle, but I think what I did was:
Take a backup of the database.
Set event 10061 to stop smon from trying to clear up.
Bounce the database
Query uet$ for all the extents in this segment that were adjacent.
Delete all the uet$ rows for the adjacent ones (couple of hundred thousand) and insert one uet$ entry that described the entire length of the set deleted.
All the uet$ entries for the segment may have to be renumbered - except I think I got lucky and didn't have to do this because my 200,000 turned into "the last one".
Update seg$ with the new number of extents.
This left the segment with a "sane" number of extent for smon to delete properly so flush the uet$ blocks from the cache and restart.
Thanks again for your reply. It sounds like you were manually coalescing the extents. We will work it through with support. There is only a few gigabytes of indexes in the tablespace in question so we may just rebuild them somewhere else and then drop the tablespace rather than messing with the uet$ table. But, we will get Oracle support's direction either way.
p.s. I think I may have a grasp on why each query is taking 2.7 seconds. It looks to me like the C_FILE#_BLOCK# cluster that uet$ is part of has 122908 blocks on this system and the trace shows cr=52299 on the fetch so that means that the number of cr blocks scanned is about 42% of the number of blocks in the cluster. I'm guessing that when you have segments with many extents there is a lot of chaining leading to many blocks of the cluster being visited for each query execution. I looked at the cluster definition and it has size 250 so maybe that is too small when you have thousands of extents.
Your comment about the cluster and block visits is correct - if you look at dcore.bsq you'll see the (Oracle 5, maybe 6) assumption still recorded:
create cluster c_file#_block#(ts# number, segfile# number, segblock# number)
size 225 /* cluster key ~ 25, sizeof(seg$) ~ 50, 5 * sizeof(uet$) ~ 150 */
storage (initial 20K) /* avoid space management during IOR I */
/
5 extents per segment on average!
That doesn't mean there's a limit of 5 rows for any one segment in a block, of course.
The numbers don't really add up - at 150 bytes per row you get about 50 rows per block, which would make 52,000 blocks (your CRs) equate to about 2.7M rows in uet$ - but the rate at which you're clearing them is to slow at present for you to have deleted down to 176,000 already so (a) you must have been deleting them much more quickly to start with and (b) the current CRs might also include a large number of undo blocks.
That's all academic, of course, the real problem is how to resolve the mess as rapidly as possible - and I'm not sure you'll be able to get rid of the tablespace without getting rid of the uet$ entries. So if you can't work around deleting them by hand have you considered exporting the entire database and creating a new one ? It might be the safest, fastest thing to do.
Thanks again for your reply. It all makes good sense. I'm concerned too about whether dropping the tablespace would clean up uet$ versus doing the manual update.
It is a 4 terabyte database so export/import would take some time but it would definately be nice to move to a cleaner environment if we could.
I think I figured out why our fet$ and uet$ counts didn't go down yesterday. It looks like after the database restart smon starts over with the last extent even though it isn't on uet$. I did this query to find the first and last ext# for the segment that smon is trying to clean up:
select 2 min(ext#), 3 max(ext#) 4 from sys.uet$ 5 where 6 ts#=74 and 7 segfile#=865 and 8 segblock#=261032;
MIN(EXT#) MAX(EXT#)---------- ---------- 0 63063
But, if you look at the smon trace from yesterday it is working on extent 176590:
I got the idea of SMON starting over from this blog post:
https://blogs.oracle.com/AlejandroVargas/entry/after_production_upgrade_thoug
Here is a relevent quote: "Shutdown abort in this case was a bad idea, on startup it forced smon to scan all extents already set, a slow and painful process... trace of smon showed that it scanned uet$ row by row reviewing all the job that was already completed"
Hopefully this means that uet$/fet$ are not corrupt and we just need to clean this up through some normal process but we will see.
The tablespace name I really used corresponded to ts#=74 in v$tablespace. We are thinking of rebuilding the other indexes in this tablespace in a new locally managed tablespace and then dropping the existing dictionary managed tablespace. My only question is whether some corruption will cause the drop tablespace to fail.
I am not sure about your current situation now, Hope its better .
I couldn't find time to reply back yesterday. Looking at the huge amounts of temporary segments in your tablespace (74), just got me thinking if you or Oracle Support have made you aware of an event DROP_SEGMENTS which is an event users may invoke to clear temporary segments. It drops the temporary segments just like SMON does in the background. This event has the advantage of not having the CPU consumed by SMON.
Note 47400.1 : EVENT: DROP_SEGMENTS - Forcing cleanup of TEMPORARY segments
There is also a method to patch (make changes) dictionary (not recommended otherwise). You could seek help from Oracle Support
The method is to identify the segment( which you already have) and update the segment_type from temporary to 999.
Note 388619.1: Last resort when SMON takes 100% CPU
I'll take a look at those notes. At this point we are running fine with 10061 and 10269 events set so there isn't a crisis. But, of course we need to clean things up eventually so we are planning our next move. But, no one stayed up late last night working on it since the system is running.
Thanks again for your input.
I put this question in the SR if anyone wants to see it:
Well, If a process requires a sort segment, you need the SS enqueue.
When SMON is trying to clean up your dead transaction (failed rebuild operation), it may not service other work like cleaning up temporary space, handling sort-segment requests.
If you have removed the events, you are making SMON do your cleanup again. Why not have these events set and let the server process perform the cleanup so that you do not face any SS enqueues?
I believe Jonathan had suggested dictionary patching by updating the seg$ as described in the note that I had referred previously.
I don't think you would be able to drop the index tablespace as you still have temporary segments that needs a cleanup.
A thought by twitter from Mark Farnham - if you do an "alter tablespace offline", "drop tablespace included contents and data files" Oracle may have a fast way to eliminate all the entries from uet$, and fet$ because it doesn't have to move rows from uet$ into fet$ and check adjacency - it simply has to "delete where ts# = ".
Like me, he was working from memory from a long time ago - so can't be certain of details: check with support, model on a spare database, and take a backup of production before trying it if you think it's worth trying.
Mark's comments about the alter offine and drop ring a bell. I think early on researching this on MOS I got the idea of the tablespace drop from an Oracle support document. We have a development database of the same version. I've been thinking about trying to recreate the situation there by creating a table with millions of small extents to fill up UET$, etc. At this point turning off the SMON cleanup isn't hurting us in production so I may have time to do some testing on our dev database. Will definately verify a good backup of prod prior to doing anything radical.
After a bit of out of forum "dinosaur" crosstalk with JL, I believe you are trying to solve two problems: 1) Get rid of an old TEMP tablespace 2) Cleanup uet$/fet$ efficiently from one or more broken/crashed 999.99999 named temporary objects that are probably residue of failed completion of creation of the final image/ object rename to the index in some user tablespace.
From the thread it seems you are prepared to recreate any and all contents of the tablespace #74. That, or saving any source of data objects (meaning clusters, tables, and iots [which are tables, doggone it, but some people think they are indexes]) and being willing to recreate any dependent objects such as indexes and materialized views that may physically exist in that tablespace.
IF (and only if) that is true, I believe your likely best option is to take tablespace #74 offline and then drop that tablespace including contents. This should bypass the "careful" (and pessimal) algorithm for multiuser consistency of uet$/fet$ entries that has to be done for dictionary tablespaces that are open in favor of a straight delete by the file numbers of the components of the tablespace. This should work correctly without you having to do any rogue surgery directly on uet$/fet$. Of course, at the end of this operation everything in tablespace #74 will be GONE.
As for the temporary tablespace (if it is having any affect on uet$/fet$, being the antique permanent sort of tablespace holding temp bits), then you should be able to do the same thing for it.
Given the luxury of an instance stop, and acquiescence to the caveats above, probably together with a backup if this is production, I'd be inclined to startup mount, alter the datafiles component of the tablespaces in question offline, startup restricted, verify that the datafiles being offline left the tablespaces in question offline, so smon is not trying to molest them, and then execute the alter tablespaces drop including contents on first the user table and then the deprecated temp.
With the noted huge numbers of rows, this won't be instanteous, but it should operate at reasonable amount of speed. This will not shrink the cluster containing uet$/fet$, but when the file numbers are re-used at least you'll re-use that space.
It's been a while, but that as I recall it is the optimal scenario for clearing huge numbers of rows from uet$/fet$ quickly in a supported fashion from v6.0 through hmm, still, for dictionary managed tablespaces. As far as I know they never improved that online dictionary free extents code in favor of working on locally managed tablespaces.
Good luck.
mwf
Thanks for your reply. Your recommendation is similar to what we had in mind, but the difference is that it doesn't require any wierd events being set to accomplish it. Our plan was to move all the objects out of the tablespace with the 10061 and 10269 events set to stop SMON from working. Your idea of making the tablespace offline before opening the database is nice. Cleaner for sure. The temp tablespace is already dropped. It was locally managed so there is no uet$/fet$ cleanup there. But, we want a way to cleanup the index tablespace after moving all the indexes out of it - only a few gigabytes are in there out of 500 or so of space allocated so shouldn't take long to move the indexes.
I'm building a small test database now so I'll play with it there.
From my tests it appears that dropping the tablespace is no more efficient than the smon cleanup. It runs the same statements against uet$:
Do the drop should take a number of weeks.
It is possible we could run with the events set for that long and just run the drop tablespace in the background. I'm just not sure if it would impact something else.
Well, it looks like we will probably do the following:
-- do all these steps as SYSDBA
-- verify that events 10061 and 10269 are set:
show parameter event
shutdown immediate
startup restrict
update seg$ set type#=999 where type#=3 and file#=865 and block#=261032 and ts#=74;
-- if only one row updates (it should)
commit;
shutdown abort
-- must be abort
-- take out events
startup
-- check if all seems ok
This is from Oracle's document "Last resort when SMON takes 100% CPU (Doc ID 388619.1)"
Oracle support is encouraging us to 11.2.0.4 which makes sense. But I guess the "last resort" is to change the segment type so SMON ignores it.
But, I'm not sure how this will affect our issue with SS enqueues and sort segment request waits.
Hopefully my last update until we make the "last resort" change. I think I understand the sort waits we were seeing. I believe there were two situations.
1 - SMON spinning when we dropped a locally managed temporary tablespace
Since smon is busy it couldn't clean up the sort segments and user sessions that needed access to a sort segment hung.
2 - SMON spinning after bouncing database
SMON has to do something after a database bounce to let users start using a sort segment. Looks like a busy SMON will lock out user sessions until SMON "adopts" the sort segment.
The document I saw relating to case 1 regarding dropping temporary tablespaces, sort segments, and a busy SMON was this:
Diagnosing BUG 5016142 enqueue ENQ - SS contention (Doc ID 416823.1
I'm not 100% sure on 2 but it is similar to this case where SMON is doing transaction rollback after a bounce:
Bug 1809682 : NEED SORT SEGMENT ... SMON IS BUSY DOING ROLLBACK
My point here is that we had SMON spinning for three weeks with no sort waits, but when we tried to drop a temporary tablespace they started. Then after setting the 10061 and 10269 events we were able to drop the temporary tablespace completely. But, when we took the events back out we saw sort waits again! But, the second set of waits was because we restarted the database and SMON was spinning from the beginning and never opened up the sort segment to the users.
Was your drop tablespace attempt with the datafiles and tablespace offline? If so I am surprised.
The tablespace was offline. Here is my script:
alter session set max_dump_file_size = unlimited;ALTER SESSION SET tracefile_identifier = 'bobbydurrett';ALTER SESSION SET EVENTS '10046 trace name context forever, level 12';
alter tablespace table_ts offline;drop tablespace table_ts including contents;
ALTER SESSION SET EVENTS '10046 trace name context OFF';
I created a table in the tablespace with 5391 extents and just like the SMON trace the drop starts from the last extent querying uet$ for each one:
PARSING IN CURSOR #13 len=94 dep=1 uid=0 oct=3 lid=0 tim=1619173977 hv=1839874543 ad='d264a20'
PARSE #13:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1619173977
BINDS #13:
bfp=80000001000d80f0 bln=22 avl=02 flg=05
value=4
bfp=80000001000d80c0 bln=24 avl=02 flg=05
value=5
bfp=80000001000d8090 bln=24 avl=02 flg=05
value=2
bfp=80000001000d8060 bln=24 avl=03 flg=05
value=5390
EXEC #13:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1619173978
FETCH #13:c=0,e=0,p=0,cr=32,cu=0,mis=0,r=0,dep=1,og=4,tim=1619173978
STAT #13 id=1 cnt=0 pid=0 pos=0 obj=13 op='TABLE ACCESS CLUSTER UET$ '
STAT #13 id=2 cnt=2 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN '
PARSING IN CURSOR #13 len=94 dep=1 uid=0 oct=3 lid=0 tim=1619173978 hv=1839874543 ad='d264a20'
PARSE #13:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1619173978
value=5389
FETCH #13:c=1,e=1,p=0,cr=32,cu=0,mis=0,r=0,dep=1,og=4,tim=1619173979
So, it looks like this is the way a segment gets cleaned up in a dictionary managed tablespace if you use the normal Oracle sql commands.
Jonathan's idea of doing our own coalescing of the uet$ rows makes sense but for us it seems safer just to change the segment_type so smon won't try to clean up the uet$ rows for the temporary segment.
Yes, it will still do the selects. But I do not believe it will attempt the coalesce or do the inserts into uet$ from fet$. It should simply to deletes from both. If you let it run (perhaps on something smaller) until you get to a delete you will be able to tell whether my memory is correct.
If you do the bit about startup mount and taking the datafiles offline in the first place so the tablespace never comes online in the duration of the run of the instance it *might* skip the select, knowing it does not have to possibly flush any dirty blocks. I simply don't remember that bit.
Once you've let your test run to the actual delete to verify whether I am correct it does the "quick" thing instead of the long complicated thing, this IS a case where the tracing probably is a significant component of the load.
The unsupported way to do this is of course to do the deletes yourself, which will prune the select. I am not recommending you do that on a production database.
I've uploaded the trace and tkprof output if you want to see it:
http://www.bobbydurrettdba.com/uploads/dropdmtablespace.zip
What I did to mimic the situation on our production system was create a table with 5391 extents, drop the table, and kill the server process doing the drop. The database was up with events 10061 and 10269 set so SMON wouldn't do the cleanup. It appears that there were about 638 selects from uet$ which ended when it found an existing extent. Then there were about 4761 deletes from uet$. It doesn't add up to 5391 exactly but it's close.
I'd have to do another test to try it with the datafiles offline and do the startup mount but it could be done.
Thanks for your help with this.
I went back and tried offlining the datafile instead of the tablespace and it didn't make a differerence.
Thanks again to everyone who participated. I feel good about updating the seg$ row which in our situation seems like the safeest approach.
Thank you for letting me know the option that you did choose and very glad to know that your issue is resolved.
Updating dictionary objects is always a pain, but in your particular case this seemed the best one. Its not recommended to update dictionary and can leave your DB as un-supported, but since you are on 8i, you do not have to worry about it .
I would also recommend that you upgrade your DB to 11.2.0.4 or 12.1 if possible to have a supported configuration.