Data Corruption Problems

What could be causing data corruption problems?
How can corruption problems be rectified?

Introduction
Opportunistic Locking
Screen Savers and Energy Savers
Write Behind Caching    WORTH READING.
File Formats
Repairing Datafiles
Old Files
Cabling    WORTH READING.
General Hints and Tips
Additional Notes

Introduction

Generally one can say that the OMNIS Datafile technology is safe, but it does have one essential weak point: there is no server side checking of the data, as is the case with SQL backends. OMNIS relies on a functioning network to write data to a file server and if this networking is defective network packets can get lost, thereby possibly corrupting the datafile. The impression that the network is OK is not always correct, especially in high traffic situations, as in these cases corrupted packets can slip through.

More often than not, the cause of the corrupted datafile is some network issue. The points below try to give an overview of what can cause network problems and what might be the cause of damage to a datafile and are in no specific order. The Write-Behind-Caching seems to have helped in many cases though.

Opportunistic Locking

Opportunistic locking on NT should be turned off:

WHAT IS OPPORTUNISTIC LOCKING: Opportunistic locking is used by Windows NT to perform read-ahead, write-behind, and lock caching. Basically, if one client is accessing a block range in a file, that range is marked for opportunistic locking and the client can perform read-ahead, write-behind, and lock caching. If another user attempts to write to that block range, the opportunistic locking has to be switched off for the previous client and the data needs to be synchronized with the server before the second user can access the range.

SITUATION: Users were seeing regular corruption of their database. All had the package installed on a Windows NT Server (3.51 or 4) and were running Windows 95 at the workstations. Corruption would happen several times a day.

CAUSE: Windows NT Server tries to use a feature called Opportunistic Locking in order to speed them up. This does not work well with a database.

RESOLUTION: This fix needs careful attention. We recommend that a responsible network person make this change. Any time that you edit a machine's registry information, you risk bigger problems if it is not done correctly.

Steps to disable opportunistic locking on an NT Server:

  1. Open REGEDT32 on the server machine.
  2. Go toHKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\
    LanmanServer\Parameters
  3. From the menu, select Edit/add value
  4. Fill in the blanks (Value Name) EnableOplocks (Data Type) REG_DWORD
  5. Select OK
  6. A DWORD editor dialogue box will appear, type in a zero and leave it HEX.
  7. Select OK. The new value should appear on the right half of the registry viewer.
  8. Exit the registry editor.
  9. Reboot the server. The value will only go into effect after a reboot.

See attached file that performs this procedure for more information.

Screen Savers and Energy Savers

Screensavers and energy savers on Windows machines should be turned off, especially on the server. These are supposed to disable all connections cleanly if the computer has been idle for some time and reconnect after the computer recognises some action, but more often than not these do not function correctly. OMNIS is very network sensitive, so if the network is not OK with packets getting lost, OMNIS has no influence over what data is written to the datafile. So if the reconnection to the datafile is not clean, data can be lost and the datafile corrupted.

Write-Behind Caching

Turn off write-behind caching on the Win95/98 machines. This type of caching stores information that needs to be written to the hard disk and sends it when the system is idle or after a certain amount of time has elapsed. This is a built-in feature of Windows 95/98 and is provided by the SmartDrv Utility under the various versions of Windows 3.

Disabling Write Behind Caching:

Using Windows95:

  1. Right button-click on My Computer, and select the properties menu item.
  2. Click on the Performance tab in the System Properties window that appears.
  3. Click on the File system... button at the bottom left hand corner of the window.
  4. Click on the Troubleshooting tab in the File System Properties window that appears.
  5. Place a tick in the "Disable write-behind caching for all drives" check box.
  6. Click OK in the File System Properties window.
  7. Click OK in the System Properties window.
  8. Reboot Windows 95.

This will reduce the performance of your machine slightly, as writes to disk are changed from write-behind to write-through caching. If you are doing a reorganisation of a very large file and want to get every bit of performance out of your system, it is worth turning this flag off and rebooting before doing the reorganisation. The speed hit depends on the performance of your hard drives and their interfaces.

The main concern is that popular system optimisation software (e.g. First Aid) suggests to the user that this setting is bad, and tries to turn it off again, enabling write behind caching. So even if you have done the right thing, the user (or a technician trying to improve the system performance) may unwittingly undo all your good work.

Using Windows for Workgroups 3.11 (WFW 3.11):

N.B. You must upgrade to WFW 3.11 if you are using an earlier version.

If you are using 32-bit file access on all or some drives:

There may be a line (or lines) in your system.ini file that is/are present in the section entitled [vcache] beginning with ForceLazyOn= or ForceLazyOff= .

  1. If there is a line beginning with ForceLazyOn=, delete the entire line.
  2. If there is a line beginning with ForceLazyOff=, ensure that all the active drives in your system are included in the letters following ForceLazyOff=, e.g. if your system has two drives, C: & D:, make the line read as follows:
    ForceLazyOff=CD
  3. If there is no line beginning with ForceLazyOff=, add the following line in the [vcache] section:
    ForceLazyOff=CDEF

Again, in this instance the letters CDEF refer to the four drives C:, D:,

E: & F: and should be changed as required to suit your system. You should also include network drives in deciding what letters to add to the line.

The [vcache] section of the system.ini file should look something like this when you have finished:

[vcache]
MinFileCache=512
ForceLazyOff=CDEF

If you are using 16-bit file access on all or some drives:

There should be a line in your autoexec.bat file that looks something like this:

c:\dos\smartdrv.exe

Add the switch /x to this line so that it reads:

c:\dos\smartdrv.exe /x

Using Windows 3.1:

There should be a line in your autoexec.bat file that looks something like this:

c:\dos\smartdrv.exe

Add the switch /x to this line so that it reads:

c:\dos\smartdrv.exe /x

File Formats

If the file formats (or classes in Studio) are corrupt this will get passed on to the datafile. Have a look at that file format. It will probably look OK. It will be corrupt at the tokenised level. You will need to replace the file format. See http://omnis.notabene.at/html/demos.html#SlotMaker for a tool from The OMNIS LAB that creates new file formats based on the slots in a data file.

A corrupted datafile will stay corrupted, so it would be advisable to implement one or more of the steps above, then export/import the data and update the library and datafile concurrent to the changes made to the operating systems.

Repairing Datafiles

The best way to check or repair a datafile is to run a Full check. The procedure is:

  1. Run the repair utilities with all the ‘Check data file structure’, ‘Check records’, ‘Check indexes’ and ‘Repair data’ options selected. Completely ignore any messages reported in the log.
  2. Repeat step 1. A second time.
  3. (Optional) Clear the check data log and repeat step 1. but without the ‘Repair data’ option. Any messages that now appear in the log will probably denote irreparable damage.

A full check should fix the great majority of problems, if it doesn’t the only solution is to export and re-import the data.

Datafiles often pick up small amounts of damage with regular use and this generally causes no long-term problems (just like Norton Utilities nearly always seems to find something wrong with a hard disk). So even if a datafile seems to be working fine it is sensible to perform the Full Check routine described above every month or two. This could usefully be carried out after the network hardware check recommended elsewhere in the document.

Don't use the Quick check facility, instead rely on performing a Full Check every couple of months. In practice most damage reported by Quick check is not permanent but was instead flagged by some momentary network glitch which Omnis managed to successfully circumvent.

If a datafile becomes damaged on a regular basis always check the network for hardware problems before attempting to repair the datafile.

Plan ahead and assume that problems will happen from time to time. Make sure there is a reliable backup system and plans in place to perform periodic checks and deal with emergencies - it takes a long while to perform a Full Check on a large datafile and even longer to export and re-import data. This planning may identify cases that a server based SQL database is the only sensible solution for a large amount of business critical data.

Old Files

Datafiles created prior to Omnis 7 v2 may appear to function correctly but often contain invisible damage that was not picked up by the repair tools available at that time. This means that it is safer not to convert these old datafiles, instead export the data with the original Omnis and import it into a new datafile with the current Omnis.

Cabling

Defective network cable or connectors can be a problem, especially in an Ethernet network.

Twisted pair tends to be a lot safer. Even old network cables can be a cause.

There can be "cross talk" caused by poor quality cables and connectors. "Reflection" caused by improper cable radiuses and running too close to electrical lines. Missed and corrupted packets caused at the software level by incorrectly installed drivers and/or corrupted drivers. A malfunctioning hard drive can write bad data and or lose data in selected sectors. The list goes on.

A 4K Cable tester is an investment worth making. Many sites that are inspected with this tester do not meet category 5 cable guidelines. Generally cables are tested and certified to 100mhz, then 3COM Ethernet cards are recommended for ALL the machines on the network, including those not running the software.

Problems can mysteriously clear up after cabling is upgraded to Cat 5 from Cat 3. In the case of low-end network cards, perhaps some cards do not do check summing very well, in which case a corrupted packet could get through.

Even a network class 5 cable that had a desk leg placed on it has been the cause of problems. It was causing one computer to run slowly and thus corrupted the data.

It is necessary to check for bad cables, cards and hardware by doing a 'ping-a-thon' once a month to every piece of network hardware.

General Tips and Hints

Never try to reorganise data if severe data damage is suspected. With current Omnis versions this will only make things worse.

Make especially sure that there is a reliable backup before repairing or reorganising data. Otherwise a crash during these operations could be really bad news.

If a workstation crashes for any reason whilst Omnis is updating the datafile it can cause corruption and locking problems. Make sure the users are educated not to switch off their workstations improperly.

We always set the NT Server performance setting to "balanced" rather than "maximize for file sharing"

Please find below an example structure (simplified) for how to update or insert records to a datafile:

Load error handler STARTUP/18
Repeat
    Cancel prepare for update
    Prepare for edit
    ;Data Update Process
    ;Data Update Process
    ;Data Update Process
    ;Data Update Process
    ;Data Update Process
    Update files
Until Flag true
Unload error handler STARTUP/18

The actual error handler is simply:

Parameter ErrorCode
Parameter ErrorText
;(We write the error code and text with time/date stamps to a log)
; Format error string and append to file using FileOps commands
Calculate #F as 0
SEA continue execution

Two network cards with different driver versions installed can be a corruption cause. This causes bad network packets in the network eventually resulting in a damaged datafile. Note that any computers in that network, not just those running OMNIS, can create the bad packets, messing up the network in general.

Another problem that occurs from time to time is corruption on the hard disk. Have you ever run Scan Disk and found a cross-linked file? If you have and you have a large data file then the chances are that the cross-link is in your data.

Packets being sent between the routers, as you know, are checked between the routers for packet integrity. The router can request that the packet be sent again from the sending computer if it senses any problems. There are saturation points on all routers that can cause a bad packet to slip through. Routinely getting Damaged Data, Bad Pointer' errors and having to re-index and export/import on a regular basis is an indication of this.

Remember that an Omnis native datafile is updated AND MANAGED solely by the client computer (unlike an SQL server). The client even seems to control the resorting of all the indexes contained in all files of all the records involved in your update. If something gets in the way the results can be messy. There are many benefits to using a native datafile over anything else. This, however, is not one of them.

Omnis tells me when I check the Data file: "Needs repair The record structure for {File-name} is damaged".

Remedy this problem by erasing these file formats and rewriting them from scratch.

Use TCP/IP as a Network protocol. Problems can be experienced with NetBEUI for example. Switching from NetBEUI to TCP/IP can remedy this.

Customers have been reporting data corruption problems running ASIP 6.1.1 on a Mac when Windows AND Macintosh clients are connected to the server. The problems do not seem to occur when connecting the Windows machines using PC MacLan v7.x . Apparently Apple is aware of a problem and this will be fixed in ASIP v. 6.2 .

Additional Notes

NONE

vcl.co.nz ©. designed by Vision Computing Ltd All pages copyright . For enquires email brendon@vcl.co.nz