Sunday, December 7, 2008

Recover an accidentally deleted file when the database is still open.

On Unix/Linux, when a file is deleted, but a process still has the file open, the file is still there in the filesystem, and only the inode is removed.
But the process can continue to use its file handle, and the file can also be accessible under /proc//fd .

In the following example, we use that behavior to recover a lost datafile after is has been dropped from the os (with rm) but the datafile is still open by the background processes.

First, we create a tablespace, and populate a table in it.

SQL> REM we create a tablespace:
SQL> create tablespace TEST_RM datafile ‘/var/tmp/test_rm.dbf’ size 10M;
Tablespace created.

SQL> REM we create a table in it:
SQL> create table FRANCK tablespace test_rm as select * from dba_objects;
Table created.

SQL> REM we check that table data is accessible:
SQL> select count(*) from FRANCK;
COUNT(*)
———-
12708

SQL> exit
Disconnected from Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

Then, we drop the datafile from unix prompt.

here is the datafile
ls -l /var/tmp/test_rm.dbf
-rw-r—– 1 oracle dba 10493952 Mar 26 14:25 /var/tmp/test_rm.dbf

we ‘accidently’ drop the datafile
rm /var/tmp/test_rm.dbf
ls -l /var/tmp/test_rm.dbf
ls: /var/tmp/test_rm.dbf: no such file or directory


Here the datafile is lost.
Now we connect again.

sqlplus / as sysdba

Connected to:
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production

SQL> REM and we check if table data is accessible:
SQL> select count(*) from FRANCK;

select * from franck
*
ERROR at line 1:
ORA-01116: error in opening database file 5
ORA-01110: data file 5: ‘/var/tmp/test_rm.dbf’
ORA-27041: unable to open file
Linux Error: 2: No such file or directory
Additional information: 3

The datafile is lost and data is not accessible.

However, the datafile should still have an open file descriptor by an oracle background process

we check the dbwriter pid:
ps -edf | grep dbw
oracle 2661 1 0 Mar25 ? 00:00:06 xe_dbw0_XE
oracle 7044 7037 0 14:40 pts/1 00:00:00 /bin/bash -c ps -edf | grep dbw
oracle 7046 7044 0 14:40 pts/1 00:00:00 grep dbw

and we check its opened file descriptors for our file:
ls -l /proc/2661/fd | grep test_rm
lrwx—— 1 oracle dba 64 Mar 26 14:02 66 -> /var/tmp/test_rm.dbf (deleted)

here it is:
ls -l /proc/2661/fd/66
lrwx—— 1 oracle dba 64 Mar 26 14:02 /proc/2661/fd/66 -> /var/tmp/test_rm.dbf (deleted)

In some other unix, lsof may be needed to map the file descriptor with the deleted file name

first we set a symbolic link so that oracle can see it as it was before the delete:
ln -s /proc/2661/fd/66 /var/tmp/test_rm.dbf

here data is accessible, but that will be lost if dbwriter closes it file handle (i.e if the database is closed)

However we can now set the tablespace read only so that it is checkpointed, and no writes occurs on it.

SQL> alter tablespace TEST_RM read only;
Tablespace altered.

We can now copy the file safely.

then we drop the symbolic link:
rm /var/tmp/test_rm.dbf
ls -l /var/tmp/test_rm.dbf

ls: /var/tmp/test_rm.dbf: No such file or directory

and we can now copy the file
cp -p /proc/2661/fd/66 /var/tmp/test_rm.dbf
ls -l /var/tmp/test_rm.dbf

-rw-r—– 1 oracle dba 10493952 Mar 26 14:54 /var/tmp/test_rm.dbf

And datafile is now available again.

SQL> REM we have it back, lets put the tablespace back in read/write
SQL> alter tablespace test_rm read write;
Tablespace altered.

SQL> REM and we check data is still there:
SQL> select count(*) from FRANCK;

COUNT(*)
———-
12708

This is not to be used like that in production. This is unsupported and may behave differently on different unix/linux or oracle versions.

No comments: