Mark Riffey's network troubleshooting checklist

Originally published in Clarion Magazine.

Be sure to visit Mark's business blog - Business is Personal.

This page covers info Windows 98/NT to present day, but most of it is NT/2000/XP related. 

General causes

Network / file problems, such as those with symptoms like error 1477 and 2172, can take many forms. They can be caused by one or more of the following:

  • Kinked or damaged cable - just because it looks ok doesn't mean it is - test it or swap it out for another one you may have.
  • Cable running close to a fluorescent light ballast (fixture)
  • Loose connector/plug on cable
  • Old "worn out" cables, particularly coax cables that have been around for years
  • Out of date drivers
  • Bad hub or a bad port on a hub
  • Failed/failing network card 
  • Power problems (PLEASE protect your systems with a UPS, power problems are one of the biggest troublemakers we know of. Yes, a UPS might cost $79 to $450 depending on how big a unit you buy, but how much is your computer and a day's worth of business worth? Consider it much less than the time to fix a power-caused mess.
  • Network setup and/or configuration problems.
  • Inadvertent shutdowns
  • Shutting down servers while workstations are still in the program
  • Out of date network drivers (even those right out of the box are sometimes a problem)
  • Improper or less than desirable network bindings/settings
  • Loss of network connections caused by server-management-induced timeouts

NOTE: Remember that having backups is a saving grace in the face of file/network problems. Network problems can corrupt your files in a heartbeat. If you have no backups, you are in big trouble (future or present - trouble will occur). Having backups is a responsibility you must take VERY seriously.

Another nice resource of Windows networking info is http://www.windowsnetworking.com/articles_tutorials/ (Thanks to Earl for passing that one on).

Drivers up to date?

Windows networking is subject to a number of problems, MANY of which can be solved simply by installing updated driver software from the manufacturer or (more often) Microsoft. The link below will go to a web page that describes just ONE of the problems in Windows peer-to-peer networking, yet there are several other problems referenced at the bottom of that page. In particular, anyone on Windows 95 needs to get their network drivers and "requestor" updated. http://support.microsoft.com/support/kb/articles/q174/3/71.asp and http://support.microsoft.com/support/kb/articles/q148/3/67.asp in particular note some problems that can burn you.

Windows NT users - Are you on service pack 6 instead of service pack 6a or another service pack? If so, expect lots of problems. Microsoft has acknowledged that service pack 6 broke a lot of things network-wise. You can get service pack 6a at their site or you can go back to service pack 5, either of which is stable. In addition, do NOT mix service packs on different NT machines on your network. In other words, run all your NT machines on service pack 5 or on service pack 6a, but not a mix of both service packs.

Test your network using TestLock
Download this program. Follow the instructions. Use it. Note: I didnt write it. I cant support it. I just follow the directions and use it.

Is your network slow when using a mapped drive letter?

The reason is this: The computer has both TCP/IP and NetBEUI (network protocols, similar to different spoken languages). TCP/IP for the Internet and NetBEUI for the local network. TCP/IP is the default protocol. When connecting to a mapped drive after some idle time, the computer tries to connect first over TCP/IP and times out. Then and only then it tries the NetBEUI connection. Go to the Control Panel > Networks > Bindings. Make NetBEUI as the default protocol.

Is your network slow when using a mapped drive letter? (part 2)
Is the drive mapped to the main computer's drive or to a folder?
If it is mapped to a folder, you will likely see a decrease in performance, often a quite noticeable decrease. We are not sure why this happens, but mapping directly to the drive has been proven time and time again to be faster. We have not discovered the reason for this, despite extended searches of Microsoft's tech database ( http://msdn.microsoft.com ).

Is your network slow?

Recently, we have noticed that the "Windows Indexing Service" has a seriously negative effect on network performance. Turn it off. The indexing service scans your hard disk and indexes the files so that the next time you do a file search, Windows can find the files more quickly. Turn it off. Think about how often you do searches vs how much time you waste waiting on your network. Do a search and do other work while waiting for it. Its just not worth waiting 99% of the time to speed up 1% of your work.

Is your Windows 2003 network slow?

  1. Get all the Windows 98 machines off of the network. Not just out of your program, but OFF THE NETWORK.
  2. Get all XP machines on Service Pack 2.
  3. Get Windows 2003 on Service Pack 2 or later.
  4. Disable SMB On the Windows 2003 Small Business Server, run "gpmc.msc" and make sure the following policies (10 in total) are all 'Disabled' (instead of 'Not defined') in BOTH 'Default Domain Security Policy' and 'Default Domain Controller Security Policy':

NOTE: The policies are under 'Windows Settings' -> 'Security Settings' -> 'Local Policies' -> 'Security Options'.

    • Microsoft network client: Digitally sign communications (always): Disabled
    • Mcrosoft network client: Digitally sign communications (if server agrees): Disabled
    • Microsoft network server: Digitally sign communications (always): Disabled
    • Microsoft network server: Digitally sign communications (if client agrees): Disabled
    • Network security: LAN Manager authentication level: Send LM & NTLM - use NTLMv2 session security if negotiated

Restart the DC and client computer to take effect.

Do some or all computers on your network randomly "die", "go to sleep" or "hang"?

Usually, this is caused by power management being active on the workstation and possibly the server. Power management is a fancy computer geek word for "Windows has settings that turns stuff off when it hasn't been used in a while". Power management is a bad thing on a network. It's great on a laptop at 37,000 feet with 3 hours remaining of your flight, but it's far more trouble than it is worth otherwise. Bottom line issue: You dont want network cards turning off because you haven't moved your mouse for 20 minutes. You dont want your server's hard drive turning off because no one has touched the server keyboard in the last 30 minutes (this might make your workstations just a little bit cranky when they are trying to read stuff on that server's drive). This is exactly what Power Management is supposed to do, but you don't want this to happen when using a networked database. To investigate, go to Start, settings, control panel (XP in "ugly mode" or Windows 2000) or Start, Control Panel (XP in "pretty mode") and double click the Network Connections icon (if that doesnt exist on your computer, you need to find the place where you can change settings on your network cards). Find your network adapter on this screen. Usually it will say something like "Local Area Connection" or "Wireless Connection 1" (if you are ignoring our advice and using wireless). Right click that icon, click properties. When the screen opens, you'll see the name of the network card up near the top, just below the tabs. To the right of that, there is a Configure button. Click it. When the next screen opens, there will almost certainly be a Power Management tab. On that tab, chances are you will see a checkbox that says something like "Allow the computer to turn off this device to save power". Uncheck the box and click OK until you dont have to look at all these network settings anymore. Reboot your PC, hope for the best.

Windows 98 networking 

Here is Microsoft's "best place to start" page for dealing with Windows98 issues, including networking issues. http://support.microsoft.com/highlights/w98.asp

Windows ME (Millennium) networking

Here is Microsoft's "best place to start" page for dealing with Windows ME/Millennium issues, including networking issues. http://support.microsoft.com/highlights/winme.asp

Windows 2000 networking

Here is Microsoft's "best place to start" page for dealing with Windows 2000 issues, including networking issues.
http://support.microsoft.com/highlights/Win2000.asp

Windows XP networking

Here is Microsoft's "best place to start" page for dealing with Windows XP issues, including networking issues.
http://support.microsoft.com/highlights/winxp.asp

Windows 2003 Server networking

http://support.microsoft.com/default.aspx?scid=fh;EN-US;winsvr2003

Need Netbeui on your XP systems and can't find it?

Click here to find out how to install it. http://support.microsoft.com/search/preview.aspx?scid=kb;en-us;Q301041

Workstation drive letters "getting the red X" (disconnecting from the main computer)

You can disable this by issuing this command from the DOS command line: net config server /autodisconnect:-1
Before using this command, we suggest you read the Microsoft article that discusses autodisconnect. You can find it here: http://support.microsoft.com/default.aspx?scid=kb;en-us;138365

Windows 2000 or Windows XP mapped drives disconnecting for no apparent reason?

e.g. showing the red X over the drive in explorer)
http://support.microsoft.com/default.aspx?scid=kb;en-us;138365

Novell Netware problems?

The problem could be your Novell Opportunistic Locking setting. Contact your network person for further details. How to turn it off? Goto Control Panel -> Networks -> Novell Client Properties -> Advanced Settings Tab -> Opportunistic Locking and make sure this is switched off on all client Machines - ALSO Make sure True Commit is ON at each client PC (This should help stop data corruption)

Performance issues are often caused by network protocol "bindings"

Check the following Network protocols basics:

  • Make sure that your default network protocol has no bindings to a virtual device (dialup.....).
  • If you are using TCP/IP and you have dialup on this workstation, try NetBEUI.
  • Try to avoid using IPX and NetBEUI together. IPX gets confused when you have a "chatty" NetBEUI. Removing IPX (if you can) is strongly advised. 
  • If you need to examine the network further, check out http://www.sysinternals.com/Utilities/TdiMon.html to get a bird's eye view of what's going on. 

Does the system work on some machines but seems to "think about it" and then do nothing on others?

Sometimes your Windows doesn't have enough "files" set in your config.sys. Try 100 or 125. If this isn't descriptive enough, you need to have your consultant do this for you. Sometimes having full-time virus scanning turned on does this. Ask your virus software vendor how to work around this OR exclude our program from your scanner if you can. Power management - Do you have Energy Star features on your computers? Probably so. Power management and networking DO NOT MIX. You can have your computers' power management features turn off and/or dim the monitor, but DO NOT have them turn off the hard drive, network cards etc. This will definitely cause you grief when computers are networked. Grief = lost data

Database corruptions, timeouts and other troubles

Another issue is the various ways that Windows9x and NT try to improve performance, often at the price of stability. Sometimes these things work, other times they cause network timeouts because they force additional file operations behind the scenes and those file operations time out (fail). One way to turn one of these items off is to turn off "Synchronous buffer commits". To do this, click Control Panel, System, Performance, File System, Troubleshooting and check the "Disable synchronous buffer commits" checkbox.

Database corruptions, timeouts and other troubles, part II

Further, Windows NT users face issues caused by some performance improvements that NT tries to implement with network applications by 'faking' multiple use of files. Unfortunately, some users experience file corruption because of this. This article is a bit of nerd-speak, but your network person should take a look at it if you are seeing "Access denied" errors on network files when they *know* that the network permissions are set properly. http://support.microsoft.com/support/kb/articles/Q129/2/02.asp The topic of this article can also be the cause of database corruption and network timeouts (drive not available messages and the like). 

Win9x/Me users - Turn off write caching

You need to disable the "write-behind cache". When the program ask to save the data, the data is kept in cache on the local machine [until the cache is flushed] instead of being on the server.

  • START > SETTINGS > CONTROL PANEL
  • System
  • Performance tab
  • Troubleshooting
  • Performance
  • Disable the write-behind cache
  • Restart the computer

Windows 2000 and Windows XP users - Turn off write caching

You need to disable the "write-behind cache". When the program ask to save the data, the data is kept in cache on the local machine [until the cache is flushed] instead of being on the server. 
Right Click MY Computer > Properties > Hardware > Device Manager 
Right Click Disk Drive > Properties 
Disable: Write Cache Enabled 
Restart the computer

Opportunistic locking (oplocks) and performance

This white paper discusses issues related to opportunistic locking - something that can seriously impact performance on ISAM databases (which ours are). This site is related to a product (DataFlex) that we do NOT use, but the same issues can impact your tps databases, usually causing error 530 or other error 5XXs. http://www.dataaccess.com/whitepapers/opportunlockingreadcaching.html

From the ClarionLive chat: Client caching features: Oplock vs. Lease and Opportunistic Locking and Read Caching on Microsoft Windows Networks

More oplock info below in the Windows 2008 / SMB2 discussion.

More Microsoft articles related to opportunistic locking

Now you see why we suggest keeping up to date on Microsoft fixes...
http://support.microsoft.com/default.aspx?scid=kb;en-us;q124916

Some Client Applications Fail when writing to Windows NT
http://support.microsoft.com/default.aspx?scid=kb;en-us;q129202

PC EXT: Explanation of Opportunistic locking in Windows NT
http://support.microsoft.com/default.aspx?scid=kb;en-us;q130922

Event error 2022: Server unable to find a free connection
http://support.microsoft.com/default.aspx?scid=kb;en-us;q138365

How the autodisconnect works in Windows NT
http://support.microsoft.com/default.aspx?scid=kb;en-us;q142803

Locking error or Computer hangs Accessing network database files
http://support.microsoft.com/default.aspx?scid=kb;en-us;q148367

Possible network file damage with redirector caching
http://support.microsoft.com/default.aspx?scid=kb;en-us;q152186

Possible network data corruption if locking not used
http://support.microsoft.com/default.aspx?scid=kb;en-us;q163401

How to disable network redirector file caching
http://support.microsoft.com/default.aspx?scid=kb;en-us;q174371

Possible database file damage when data is appended
http://support.microsoft.com/default.aspx?scid=kb;en-us;q219022

Improving performance of MS-DOS database applications (ours arent DOS, but its good reading anyhow)
http://support.microsoft.com/default.aspx?scid=kb;en-us;q296264

Configuring opportunistic locking in Windows 2000
http://support.microsoft.com/default.aspx?scid=kb;en-us;q290757

Write caching settings for hard disk may not persist after you restart your computer

Another NT issue re: slow network performance with Service Pack 4, 5, 6, or 6a (Q249799)

http://www.microsoft.com/technet/support/kb.asp?ID=249799

Fix that leaky hose

While it is certainly possible, don't automatically assume network errors are a program problem. I've seen TPS-based programs used in many, many networked systems and by as many as 130 people simultaneously on ONE network. Do other multi-user applications work ok? Can you save a text file into our application's directory using Windows Notepad? If not, the problem is more than likely with the network setup. Just one little thing related to sharing or "permissions" can mess things up. Keep in mind that our programs have many (as many as 80 or more) files open across your network at once, where Notepad only has one. Like a leaky hose where you dont see the leaks till lots of water is going through the hose under pressure, a network can exhibit similar behavior and not fail until it is under a heavy load.  

Getting a TPSBT 1477 and/or 2172?

The 1477 and 2172 errors are caused by improperly "closed" files. Kind of like a file cabinet whose drawer or file wasnt closed. Improper closing can be caused by rebooting the server while the workstation is in the program, rebooting a workstation while it is in the program, logging out while you are in the program, having a power outage or even a "burp" in the power, and so on. The items noted above can help this situation as well. Our tps-based programs have anywhere from 30-80 files open at once. Most other programs that you use on the network dont "push" the network anywhere near this hard. Sometimes a network is like a bad garden hose. Turning the water on slow doesnt expose a leak. Turning it on full force and putting your thumb over the end does. 

Another nice network troubleshooting resource

http://farreachtech.com/network_troubleshooting.asp

SMB2

On Server 2008 is a new OpLocks called SMB 2.0

How to disable and more info: 
http://www.petri.co.il/how-to-disable-smb-2-on-windows-vista-or-server-2008.htm
Registry key: HKLM\System\CurrentControlSet\Services\LanmanServer\Parameters
key name: Smb2 REG_DWORD 0 = disabled

See also http://www.softrak.com/products/compatibility/compatibility.php; expand the Windows 2008 Server topic. Thanks to Carl Barnes for the info.

More on SMB2 found by Jeff Slarve.

From the ClarionLive chat: SMB Opportunistic Locking Behavior

Oct 2010 update

From comments posted to the original article:

A few more resources from the ClarionLive chat:

Jan 2014 update

From Mark: