Un de mes clients a rencontré récemment un problème intéressant de gel d'ouvertures de sessions.
Mes recherches effectuées sur Internet s'étant révélées infructueuses (de nombreux reports du problème mais pas de solution efficace), j'ai donc pris mon debugger pour trouver la cause du problème.
Je vous livre ici les résultats d'analyse et la méthodologie suivie pour remonter à la source du problème.
Symptôme:
Après avoir ouvert une connection RDP sur un serveurs (mstsc.exe), la fenêtre de login s'affiche correctement, mais après saisie des credentials, l’ouverture de session reste bloquée sur Applying Computer settings.
Ensuite plus personne ne peut se loguer sur le serveur, même physiquement sur la console. Le serveur ne présente plus rien d’autre qu’un écran bleu (absence de bureau).
L’accès distant RPC/DCOM ne fonctionne plus non plus.
L’accès SMB peut être opérationnel dans certains cas.
Le serveur répond au ping.
Il faut alors éteindre le serveur physiquement et rallumer.
Diagnostics:
Installé hotfix http://support.microsoft.com/kb/948496/en-us . Celà n’a pas corrigé le problème.
Mise en place compteurs de performance mémoire / System / Serveur / Processus / Processeur.
L’analyse ne montre aucun problème 100%, pas de problème mémoire ( Available Memory Ok, Paged Pool Ok, Non Paged Pool Ok, Pte Ok).
J'ai donc modifié la clé de registre CrashOnCtrlScroll pour activer la possibilité de générer un dump mémoire à la demande sur une des machine impactée afin de collecter un dump lorsqu'une ouverture session se gèle. ( http://support.microsoft.com/kb/244139).
Analyse du dump:
Plusieurs dumps ont été générés et tous présentent le même symptôme.
Dans les grandes lignes, Winlogon est bloqué en attente d’un appel LPC traité par le service AeLookupSvc qui ne répond jamais plus aux messages, d’où la cause du « hang »
Voici un résumé de l'analyse:
Vu que l'ouverture de session est traitée par Winlogon, l'analyse de ce processus est le point de départ.
Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\temp\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
Symbol search path is: srv*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Server 2003 Kernel Version 3790 (Service Pack 2) UP Free x86 compatible
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Built by: 3790.srv03_sp2_gdr.090319-1204
Machine Name:
Kernel base = 0x80800000 PsLoadedModuleList = 0x808a8e48
Debug session time: Mon Jan 18 10:55:19.400 2010 (GMT+1)
System Uptime: 3 days 0:50:29.015
Loading Kernel Symbols
...............................................................
Loading User Symbols
Loading unloaded module list
..
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck E2, {0, 0, 0, 0}
Probably caused by : i8042prt.sys ( i8042prt!I8xProcessCrashDump+256 )
Followup: MachineOwner
---------
kd> !process 0 0 winlogon.exe
PROCESS 8a3f4540 SessionId: 0 Cid: 0160 Peb: 7ffdb000 ParentCid: 0118
DirBase: e798b000 ObjectTable: e1417a68 HandleCount: 501.
Image: winlogon.exe
PROCESS 8a411a60 SessionId: 1 Cid: 0a70 Peb: 7ffd8000 ParentCid: 0118
DirBase: 0ed55000 ObjectTable: e138ce38 HandleCount: 301.
Image: winlogon.exe
PROCESS 89a3d7b8 SessionId: 3 Cid: 0f2c Peb: 7ffd8000 ParentCid: 0118
DirBase: 1e029000 ObjectTable: e1b73cf0 HandleCount: 231.
Image: winlogon.exe
kd> !process 8a3f4540 1
PROCESS 8a3f4540 SessionId: 0 Cid: 0160 Peb: 7ffdb000 ParentCid: 0118
DirBase: e798b000 ObjectTable: e1417a68 HandleCount: 501.
Image: winlogon.exe
VadRoot 8a628a10 Vads 132 Clone 0 Private 1142. Modified 9. Locked 0.
DeviceMap e1001140
Token e1550910
ElapsedTime 3 Days 00:50:16.713
UserTime 00:00:00.078
KernelTime 00:00:00.984
QuotaPoolUsage[PagedPool] 50852
QuotaPoolUsage[NonPagedPool] 62208
Working Set Sizes (now,min,max) (2503, 50, 345) (10012KB, 200KB, 1380KB)
PeakWorkingSetSize 2508
VirtualSize 44 Mb
PeakVirtualSize 46 Mb
PageFaultCount 19815
MemoryPriority BACKGROUND
BasePriority 13
CommitCharge 1851
kd> !process 8a411a60 1
PROCESS 8a411a60 SessionId: 1 Cid: 0a70 Peb: 7ffd8000 ParentCid: 0118
DirBase: 0ed55000 ObjectTable: e138ce38 HandleCount: 301.
Image: winlogon.exe
VadRoot 89b61ce8 Vads 175 Clone 0 Private 750. Modified 992. Locked 0.
DeviceMap e1001140
Token e19397d0
ElapsedTime 2 Days 19:41:14.890
UserTime 00:00:12.281
KernelTime 00:00:03.468
QuotaPoolUsage[PagedPool] 50988
QuotaPoolUsage[NonPagedPool] 9184
Working Set Sizes (now,min,max) (693, 50, 345) (2772KB, 200KB, 1380KB)
PeakWorkingSetSize 3481
VirtualSize 50 Mb
PeakVirtualSize 57 Mb
PageFaultCount 46010
MemoryPriority BACKGROUND
BasePriority 13
CommitCharge 1307
kd> !process 89a3d7b8 1
PROCESS 89a3d7b8 SessionId: 3 Cid: 0f2c Peb: 7ffd8000 ParentCid: 0118
DirBase: 1e029000 ObjectTable: e1b73cf0 HandleCount: 231.
Image: winlogon.exe
VadRoot 8a5e75a8 Vads 146 Clone 0 Private 523. Modified 0. Locked 0.
DeviceMap e1001140
Token e2312030
ElapsedTime 01:07:54.140
UserTime 00:00:00.109
KernelTime 00:00:00.718
QuotaPoolUsage[PagedPool] 42304
QuotaPoolUsage[NonPagedPool] 7208
Working Set Sizes (now,min,max) (1919, 50, 345) (7676KB, 200KB, 1380KB)
PeakWorkingSetSize 1921
VirtualSize 39 Mb
PeakVirtualSize 40 Mb
PageFaultCount 2428
MemoryPriority BACKGROUND
BasePriority 13
CommitCharge 825
-----------------------------------------------------------------------------------------------------------
Le process winlogon 89a3d7b8 est le plus récent et correspond à la dernière demande d'ouverture de session.
ElapsedTime est calculé à partir de la date d'analyse du dump, pas du boot de la machine
-----------------------------------------------------------------------------------------------------------
kd> !process 89a3d7b8 7
PROCESS 89a3d7b8 SessionId: 3 Cid: 0f2c Peb: 7ffd8000 ParentCid: 0118
DirBase: 1e029000 ObjectTable: e1b73cf0 HandleCount: 231.
Image: winlogon.exe
VadRoot 8a5e75a8 Vads 146 Clone 0 Private 523. Modified 0. Locked 0.
DeviceMap e1001140
Token e2312030
ElapsedTime 01:07:54.140
UserTime 00:00:00.109
KernelTime 00:00:00.718
QuotaPoolUsage[PagedPool] 42304
QuotaPoolUsage[NonPagedPool] 7208
Working Set Sizes (now,min,max) (1919, 50, 345) (7676KB, 200KB, 1380KB)
PeakWorkingSetSize 1921
VirtualSize 39 Mb
PeakVirtualSize 40 Mb
PageFaultCount 2428
MemoryPriority BACKGROUND
BasePriority 13
CommitCharge 825
THREAD 89a04b40 Cid 0f2c.0c5c Teb: 7ffdf000 Win32Thread: e1f04c58 WAIT: (Unknown) UserMode Non-Alertable
89a04d2c Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00159075
Current LPC port e12da770
Impersonation token: e1d54030 (Level Impersonation)
DeviceMap e1c55d50
Owning Process 89a3d7b8 Image: winlogon.exe
Attached Process N/A Image: N/A
Wait Start TickCount 16560871 Ticks: 221786 (0:00:57:45.406)
Context Switch Count 436 LargeStack
UserTime 00:00:00.046
KernelTime 00:00:00.531
Start Address 0x0103e1b0
Stack Init ba250000 Current ba24fc20 Base ba250000 Limit ba24b000 Call 0
Priority 15 BasePriority 15 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr Args to Child
ba24fc38 808202b6 89a04b40 89a04be8 00000301 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])
ba24fc50 8081fb6e 89a04d2c 89a04d00 89a04b40 nt!KiSwapThread+0x83 (FPO: [0,2,0])
ba24fc94 80915355 89a04d2c 00000011 80912701 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])
ba24fd50 8082350b 00000344 000a0228 00118c80 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])
ba24fd50 7c82860c 00000344 000a0228 00118c80 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ ba24fd64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
0006e9e8 00000000 00000000 00000000 00000000 0x7c82860c
THREAD 89ac2598 Cid 0f2c.0868 Teb: 7ffdc000 Win32Thread: e1cd8b50 WAIT: (Unknown) UserMode Non-Alertable
89ac2784 Semaphore Limit 0x1
Waiting for reply to LPC MessageId 001590f4:
Current LPC port e19d14aa
Not impersonating
DeviceMap e1001140
Owning Process 89a3d7b8 Image: winlogon.exe
Attached Process N/A Image: N/A
Wait Start TickCount 16562150 Ticks: 220507 (0:00:57:25.421)
Context Switch Count 51 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x7c839f2b
Start Address 0x77e617ec
Stack Init b9f33000 Current b9f32c20 Base b9f33000 Limit b9f2f000 Call 0
Priority 14 BasePriority 13 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr Args to Child
b9f32c38 808202b6 89ac2598 89ac2640 00000701 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])
b9f32c50 8081fb6e 89ac2784 89ac2758 89ac2598 nt!KiSwapThread+0x83 (FPO: [0,2,0])
b9f32c94 80915355 89ac2784 00000011 e1b6f701 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])
b9f32d50 8082350b 000003a8 000878f0 000878f0 nt!NtRequestWaitReplyPort+0x776 (FPO: [SEH])
b9f32d50 7c82860c 000003a8 000878f0 000878f0 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9f32d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
0055f7f8 00000000 00000000 00000000 00000000 0x7c82860c
[...] + d'autres threads inintéressants
-----------------------------------------------------------------------------------------------------------
Plusieurs threads de winlogon en attente LPC.
-----------------------------------------------------------------------------------------------------------
kd> !LPC message 00159075
Searching message 159075 in threads ...
Server thread 8a415468 is working on message 159075
Client thread 89a04b40 waiting a reply from 159075
Searching thread 89a04b40 in port rundown queues ...
Server communication port 0xe1cff3f0
Handles: 1 References: 1
The LpcDataInfoChainHead queue is empty
Connected port: 0xe12da770 Server connection port: 0xe1455428
Client communication port 0xe12da770
Handles: 1 References: 2
The LpcDataInfoChainHead queue is empty
Server connection port e1455428 Name: AELPort
Handles: 1 References: 13
Server process : 8a466d88 ( svchost.exe)
Queue semaphore : 8a616e48
Semaphore state 0 (0x0)
The message queue is empty
Messages in LpcDataInfoChainHead:
0000 e1954330 - Busy Id=00074ed6 From: 0328.0e7c Context=00000614 [e14554a8 . e1a945d8]
Length=00880070 Type=00480001 (LPC_REQUEST)
Data: 00000328 00000e7c 00000001 00000000 00000000 00000000
0000 e1a945d8 - Busy Id=001582c8 From: 0f2c.0b34 Context=00000f30 [e1954330 . e227c150]
Length=00900078 Type=00640001 (LPC_REQUEST)
Data: 00000f2c 00000b34 00000000 00000000 00000000 00000000
0000 e227c150 - Busy Id=00159075 From: 0f2c.0c5c Context=00000fac [e1a945d8 . e14554a8]
Length=00900078 Type=00640001 (LPC_REQUEST)
Data: 00000f2c 00000c5c 00000000 00000000 00000000 00000000
The LpcDataInfoChainHead queue contains 3 messages
Threads in RunDown queue : 0xe1954170 0xe1a94418 0xe227bf90
Done.
-----------------------------------------------------------------------------------------------------------
Le process serveur est svchost.
Le nom du port LPC est AELPort, utilisé par Aelookupsvc
-----------------------------------------------------------------------------------------------------------
kd>!thread 8a415468
THREAD 8a415468 Cid 0328.0434 Teb: 7ffa3000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
8a616e48 Semaphore Limit 0x7fffffff
Not impersonating
DeviceMap e1001140
Owning Process 8a466d88 Image: svchost.exe
Attached Process N/A Image: N/A
Wait Start TickCount 16560871 Ticks: 221786 (0:00:57:45.406)
Context Switch Count 98
UserTime 00:00:00.000
KernelTime 00:00:00.015
Win32 Start Address 0x00159075
LPC Server thread working on message Id 159075
Start Address 0x4c594a20
Stack Init ba44c000 Current ba44bc0c Base ba44c000 Limit ba449000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr Args to Child
ba44bc24 808202b6 8a415468 8a415510 00000301 nt!KiSwapContext+0x25 (FPO: [Uses EBP] [0,0,4])
ba44bc3c 8081fb6e ba44bd64 00000000 808a85a0 nt!KiSwapThread+0x83 (FPO: [0,2,0])
ba44bc80 80910228 8a616e48 00000010 ba44bd01 nt!KeWaitForSingleObject+0x2e0 (FPO: [5,12,4])
ba44bd30 80910c5c 00000424 0155fff0 00000000 nt!NtReplyWaitReceivePortEx+0x521 (FPO: [SEH])
ba44bd4c 8082350b 00000424 0155fff0 00000000 nt!NtReplyWaitReceivePort+0x18 (FPO: [4,0,0])
ba44bd4c 7c82860c 00000424 0155fff0 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ ba44bd64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
0155fff4 00000000 00000000 00000000 00000000 0x7c82860c
-----------------------------------------------------------------------------------------------------------
Ticks du thread serveur:221786 (0:00:57:45.406)
Ticks Winlogon : 221786 (0:00:57:45.406)
Il sont égaux.
Winlogon est bien en attente (donc logon freezé) de la réponse LPC depuis 57 minutes.
-----------------------------------------------------------------------------------------------------------
Conclusion:
Le problème est causé par le service AeLookupSvc (Application Experience Lookup Service) qui est freezé.
2 Solutions:
- Le correctif MS08-067 corrige indirectement le problème.
- Vous pouvez également désactiver Application Compatibility Engine par GPO. (Computer Configuration / Administrative Templates / Application Compatibility / Turn Off Application Compatibility Engine ).