0

Ever since some of our customers migrated to Windows Sever 2016 we got much more frequent (hard) crashes of our managed .net application.

The exact setup: Windows Server 2016 fileshare delivers the application, in Windows Server 2016 RDS session (other machine) the app (managed) is launched, sometimes it fails with "has stopped working", no exception, nothing. It is reproducible in parts of the program and it is not due to known/otherwise reproducible bugs.

It seems to be related to something like https://support.microsoft.com/en-gb/help/2536487/applications-crash-or-become-unresponsive-if-another-user-logs-off-a-r though there it claims this issue does not exist with WS2016 anymore.

Network issues and ghost-sessions (users not logging off for days/weeks) seems to amplify this problem.

We are going to recommend our users now to configure auto-logoff (on idle) on their Windows Server 2016 RDS machines, no feedback yet on whether that helps.

Is this sort of issue known? Is there any known workaround?

4
  • Do you find anything relevant in the Event Viewer? Are the server owning the file-share and the one running RDS both the same?
    – harrymc
    Commented Jul 9, 2019 at 10:40
  • I did not have a look at the event viewer yet. What should I look out for? The servers are different, both are WS2016, one used as RDS, one for the fileshare. Commented Jul 9, 2019 at 10:51
  • 1
    Try to look the Event Viewer for warnings or errors that might apply - if I knew what to look for I would also know your answer. What happens if you copy locally the files to the RDS server?
    – harrymc
    Commented Jul 9, 2019 at 10:54
  • Never tried copying it, run from network is used for easy deployment and updates. It is something we can try though not the first choice, since the error is sporadic and only occurs in a customer's site. Event viewer typically has a lot of entries, warnings and errors too, hence I ask what to look out for, in case you have an idea. Commented Jul 9, 2019 at 12:13

4 Answers 4

1

We are facing the same Issue.

Server 2008 r2 application server delivering application to a ws 2016 rds cluster.

Application shortcuts are redirected for RDS users from a share on the server 2008 r2 app server.

We upgraded our rds 2012 cluster to 2016 specifically to address this issue, as Microsoft's solution for the issue as you posted above relating to FCB, is to upgrade to server 2016.

Good to know it still does it with a server 2016 file server as well, as that was going to be our next troubleshooting step (upgrade file server).

App has worked for 2 months since upgrading to server 2016 on the RDS side of things with no issues.

Today we get the 'in page errors' and the same behaviour as if we were still accessing the app from WS 2012 RDS.

We don't have the luxury of running the application locally due to the need for shared files redirected across all servers in the RDS cluster, per deployment licencing costs (per server installed on) and load balancing, so 'deploy it locally' is not an option for us.

Looks like 2016 still has issues with FCB.

Auto logoff will make it worse if it is indeed the FCB Issue, as as soon as the user holding the 'magic file handle' on the remote share closes the app, or logs out of RDS, the app will crash for numerous users.

On server 2012, we used scheduled tasks and a script to open the application on a timer before any users logged in, and then use 'WAIT' to keep the file lock open for 24 hours, then clean down at the end of the day, rinse repeat, just to keep the 'magic file handle' open and locked to a background process and not a user, but this is flaky at best, and isn't sustainable in the long run.

We even went as far as using one of our dev blue prism robots to log in and open the app before core business hours, but as our user base is Indian, and operates almost 247 it wasn't feasible as a long term solution, but works if you have a stable '9 to 5 only' userbase and know when people will be logging on and off on a reliable schedule.

The key is opening the application before any other user does, and keeping the file handle OPEN, this is the best way to work around the issue.

You will notice the app is open on the rds server, but no open file handle is visible for the exe in share and storage manager on the file server for the user in question when this issue occurs, and the app crashes. The lock at the file server side will have vanished. As soon as the user holding the lock pertaining to the FCB for the shared app file closes the app, or logs off, chaos ensues.

only real pointer to you would be to make sure your file share with the app files on it isn't on a disk that is being snapshotted while the users are using the app - I have seen this be the root cause on other server 2016 boxes, but alas, it isn't the cause in our particular case.

We were hoping WS 2016 was actually the solution, as advertised but we are confident it's a Windows server issue that still hasn't been fixed despite being advertised as being fixed in WS 2016. It's weird that it's been working for us for 2 months on WS 2016 RDS, but has started happening again today.

We are still looking in to this, and have months worth of troubleshooting notes on this if you need any other help (although this pertains to RDS 2012 primarily).

Also, we could do with some pointers from other admins, as we are facing this again as of today, and we are on server 2016.

Is your app share on a NAS by any chance or connected to the app server by ISCSI , and are you using GOODSYNC for file level backups of the app share at all?

Ben.

3
  • Hello @Ben, no not a NAS, file share is on a WS 2016 too. I wonder whether it is the same issue, since I never noticed (with this app) and WS 2008 or WS 2012. But we get the issue in a few sites, an only on 2016. Commented Sep 12, 2019 at 19:00
  • Do you see 'in page errors' in the application log in event viewer? Any errors mentioning that the hard drive the file is hosted on is unavailable? Filter your application log for event 1001, 1000 and 1005 application errors. Do you see these event ids clustered together when the issue happens in the windows application log?
    – Ben
    Commented Sep 12, 2019 at 19:35
  • I could not find those. The system now has a nightly reboot, and that greatly reduced the issue. At least the customer has not called about it again. Commented Nov 17, 2019 at 21:55
1

Just to Update.

Another, more permanent fix to the issue is to downgrade .Net framework from 4.x.x and above to .Net framework 4.0.

If you are in position where you are able to do this, it will stop the errors.

The FCB issue is more pronounced on All WS OS from 2008r2 upwards if .Net installed is above 4.5

We have tested in a lab with VMs from server 2008r2 upwards all the way to 2016, and can conclude that the issue is introduced with upgrading from 4.0 to 4.5.x or 4.x.x

Not a very good working fix for production environments, but wanted to throw it out there that the issue vanishes on downgrading .Net for us on all OS we have tested so far.

Sadly we can't implement the above, but it definitely kills off the issue in our experience.

Looks like FCB issue is directly tied to 4.5.x and above on all Windows server OS.

Go figure - something has been introduced in the framework that causes the issue.

If you are on 2016 and are facing the issue, also go check your Fault Tolerant Heap error logs in event viewer.

If you are seeing errors for fault tolerant heap related to your application, the application can be excluded from FTH in the registry, which in some cases will also fix the crashes.

0

As for our particular issue, a scheduled nightly reboot fixed it, at least to the point where it is bearable.

-1

My experience is that windows 2016 server userdesktop envoirement implementation is blocking traffic and/by Dcom ports. In a cluster of servers one gets communication problems with clustered applications for which is nesecary to communicate amongst eachother. Since we are blocking outgoing/incomming traffic on the app servers the windowsupdate service coms out with a error and blocks application at the same time...

Have put out a call at MS helpdesk 3 yrs ago and still not a solution received other then we are not responsible for application crashes of third parties ...

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .