Digging in to the Citrix logon process

When you are troubleshooting slow or failed Citrix logons, no doubt that it helps to know a bit about the background events that take place to achieve a successful logon. Unfortunately it isn’t quite as simple as handing your logon credentials to StoreFront and being granted access to your apps and desktops.

No, instead, there are a number of different communications going back and forth between several different components.

It is important to know what components are involved in the logon process, and be aware of some different configurations that can be implemented to improve logon speed as well as knowing which configurations have a negative effect on logon times.

♣ StoreFront Logon Process
♣ What can cause slow StoreFront authentication?
♣ NetScaler Logon Process
♣ NetScaler Logon Failure Reasons

Citrix Logon Process (internal StoreFront, no NetScaler)

  1. User contacts and authenticates to StoreFront
  2. StoreFront contacts Delivery Controller to enumerate resources for authenticated user
  3. Delivery Controller responds to StoreFront with list of resources
  4. StoreFront displays list of resources to user
  5. User clicks on desktop or application resource
  6. StoreFront contacts Delivery Controller to request an available/best available VDA to host the session
  7. Delivery Controller finds applicable host, and responds to StoreFront with hostname and IP address of VDA. If VDA is powered off (XenDesktop) it will be powered on at this stage
  8. StoreFront creates an ICA file containing information gathered from Delivery Controller and passes to end-user
  9. ICA file is downloaded by end-user and opened with Citrix Receiver
  10. Citrix Receiver checks that a connection can be made to the VDA and then makes ICA connection
  11. VDA communicates with RDS license server for license check-out
  12. Delivery Controller creates the user session and processes Citrix policies
  13. User Authentication takes place between AD and VDA
  14. Users profile loads
  15. User GPOs process
  16. Application or desktop launch is complete and the resource is usable by end-user


What can cause each of these steps to perform “slowly”? – The 17 numbers below are mapped to the 17 numbers above.

  1. Overloaded AD for authentication, overloaded StoreFront servers. AD Sites and Services incorrectly configured.
  2. Overloaded StoreFront, overloaded Delivery Controllers.
  3. Overloaded Delivery Controllers, overloaded StoreFront.
  4. Overloaded StoreFront.
  5. Slow PC, slow Internet connection.
  6. Overloaded StoreFront, overloaded Delivery Controller.
  7. Lack of available VDAs, overloaded StoreFront, overloaded Delivery Controller.
  8. Overloaded StoreFront, slow network connection.
  9. Slow network connection, slow PC.
  10. Slow PC, latent network connection.
  11. Overloaded license server.
  12. Overloaded Delivery Controller.
  13. Overloaded AD, Sites and Services incorrectly configured.
  14. Overloaded profile store, latent network connection between profile store and VDA.
  15. Overloaded VDA, overloaded AD servers. DNS misconfiguration, too many GPOs configured to be applied.
  16. VDA poor performance.

Obviously this isn’t a definitive list, however it gives some sample ideas of what can cause slowness during the logon and resource launch process.

What can cause all these steps to perform slowly?

  1. Constrained bandwidth, overloaded network, storage, server hardware or virtual machine resource deficiency. Infrastructure failure is another impact that can have an adverse effect.

How to overcome?

Use montioring systems to monitor CPU, RAM, bandwidth, database, storage performance etc. and alert when thresholds are breached or failures occur.

Make sure each infrastructure component is redundant from hardware up to clustering and virtual machines. Also make sure storage and network throughput can cope with the demand of users especially during peak periods where logins and resource launches are going to be high.

What other factors can cause a slow logon?

  • Large user profiles, roaming profile or using UPM etc.
    • New profile average load time using Citrix Profile Management (Always cache enabled, profile streaming disabled):1-min
    • 7GB profile load average time using Citrix Profile Management (Always cache enabled, profile streaming disabled):2-min
    • OK so you notice the profile load is still virtually nothing, I don’t know why this is because the profile load is what caused 99%  of the 74sec logon duration.3-min
    • Size on disk reporting the profile as 7.19GB on the VDA.4-min
    • Now let’s enable Profile Streaming.6-min
    • And disable Always cache.5-min
    • The logon time is back down to 15-20 seconds and the size on disk reports 57.7MB for the profile because we are now streaming the profile on demand, nothing is cached just yet. This is a good feature within Citrix Profile Management that allows for fast logon times even with large bulky profiles.7-min
  • Many GPOs and GPO settings
  • Logon scripts when applied via GPO vs AD user level

Some of these recommendations have already been documented https://jgspiers.com/citrix-tips-tricks-tweaks-suggestions/

How authentication can be slowed:

  1. Overloaded AD, AD failure, AD Sites and Services not correctly configured.

How to overcome?

Ensure AD is highly available and you have enough AD servers to cope with demand based on Microsoft recommendations.

Ensure Sites and Services is correctly configured so that users authenticate with Active Directory servers close by. Regions with a large amount of users should have their own AD servers. Subnets for each office location/site VLAN should be defined within AD Sites and Services and assigned to sites that are closest to them so that AD authentication takes the optimal route.

How profile load can be slowed:

The sheer size of profiles are the culprit of slow loads. Profile bloat is one of the most common reasons why a logon may be slow for affected users. Other reasons can include a failed file server which hosts the roaming profile or network issues which prevent the profile from being fetched.

File server overloaded, lack of performance, too many users connecting to retrieve profiles or insufficient storage resources are other failure points.

How to overcome?

Some solutions such as Citrix Profile Manager include streaming and directory/file exclusion which help improve logon speeds. See https://jgspiers.com/citrix-profile-management-overview/ for more information on Citrix Profile Management.

If Citrix Profile Management takes a long time to process, you can enable logging using the Citrix Profile Management ADMX template.8-min

Redirect as many folders as possible within a users profile. Exclude directories and files that simply are not needed from being redirected or roamed/cached to the VDA. Do test in a pre-production environment first before deploying any profile optimisations within a production environment.

Large vs small profile logon times was shown above and how Citrix Profile Management can combat large profiles to ensure fast logon.

How GPOs and logon scripts slow down logon

  1. Serveral small GPOs containing a few settings rather than one or two larger GPOs.
  2. Scripts that take a long time to run.
  3. Numerous Group Policy drive maps or maps to locations that are inaccessible.
  4. Not disabling User Configuration or Computer Configuration sections of a policy when they are not in use. 9-min
  5. Many policies, not to mention many different Citrix policies as these should also be taken in to consideration at all times.
  6. Printer mapping via GPOs can cause slow logons when many printers are created. Citrix does have the policy setting Wait for printers to be created which is disabled by default and only applies to Server OS VDA. This allows a session to start without waiting for all printers to be mapped from client device (printer redirection). This does not help in situations where GPOs are mapping printers directly to a Citrix session.

Applications can slow down logon

When a user launches an application, depending on when they see the initial application landing screen is how they judge how quickly the logon process has taken. If applications make backend connections to database servers or file shares as some do, you must ensure those connections are established with minimal time. To ensure this, make sure database and file servers are highly configured and have enough resource to cope with demand when under load. Application prelaunch can help achieve quicker launch times. Also, authentication should pass-through to the application eliminating any additional authentication steps so as to not affect the user experience.

To read up on Application Prelaunch see https://jgspiers.com/citrix-application-prelaunch/

NetScaler Logon Process with LDAP/RADIUS

To view authentication logging through the CLI see https://jgspiers.com/netscaler-authentication-failures-aaad-debug/

The authentication process as follows:

User enters credentials -> NetScaler makes attempt to bind to LDAP -> LDAP search is performed for users sAMAccountName -> users group membership is extracted from LDAP -> user is successfully authenticated -> RADIUS authentication is attempted (if used) -> RADIUS groups extracted if any -> authentication is accepted.

  1. start_ldap_auth attempting to auth username @ LDAP IP (may be load balanced VIP).                       10-min
  2. Connecting to: LDAPIP:389 or 636 if using Secure LDAP. https://jgspiers.com/configuring-ldaps-citrix-netscaler/11-min
  3. receive_ldap_bind_event Bind OK (LDAP bind successful using NetScaler bind account defined in LDAP server on NetScaler).12-min
  4. receive_ldap_bind_event User name: dirty = <username> sanitized = <username>13-min
  5. ns_ldap_search Searching for <<(& (sAMAccountName=sAMAccountName) (objectClass=*))>> from base <<OU=Users,DC=JGSPIERS,DC=COM>> (the base DN that the LDAP search is performed on is based on what you have specified in your LDAP server on NetScaler).14-min
  6. receive_ldap_user_search_event Received LDAP user search event.15-min
  7. ns_ldap_check_result checking LDAP result. Expecting 101 (LDAP_RES_SEARCH_RESULT).16-min
  8. ns_ldap_check_result ldap_result found expected result LDAP_RES_SEARCH_RESULT.17-min
  9. receive_ldap_user_search_event received LDAP_OK (user was found).18-min
  10. receive_ldap_user_search_event Binding user… 1 entries.19-min
  11. receive_ldap_user_search_event User DN= full DN location of user account.20-min
  12. receive_ldap_user_search_event built group string for username of: Group membership displayed.21-min
  13. send_accept sending accept to kernel for : Username (user is successfully authenticated).22-min

At this stage LDAP authentication is complete. If you have configured RADIUS authentication for 2FA, authentication will continue:

  1. continue_radius_auth attempting to auth username @ RADIUS IP (may be a load balanced VIP).
  2. process_radius Got RADIUS event.
  3. process_radius radius accepts : username.
  4. send_accept sending accept to kernel for : username.

If you notice slow authentication, use aaad.debug, all event entries are time stamped so you can narrow the authentication process down to what part is actually slow and diagnose from there.

Other common authentication falure errors:

Invalid credentials entered by the user:

ns_ldap_check_result LDAP action failed (error 49): Invalid Credentials23-min

send_reject_with_code Rejecting with error code 4001.24-min

Wrong username entered (does not exist):

receive_ldap_user_search_event ldap_first_entry returned null, user not found.25-min

send_reject_with_code Rejecting with error code 400926-min

Logon denied for a users account in Active Directory:

ns_ldap_check_result LDAP action failed (error 49) : Invalid Credentials

send_reject_with_code Rejecting with error code 4001.

User account disabled:

ns_ldap_check_result LDAP action failed (error 49) : Invalid Credentials

send_reject_with_code Rejecting with error code 4001.

Invalid BIND account username/password:

ns_ldap_check_result LDAP action failed (error 49) : Invalid credentials

receive_ldap_bind_event Got LDAP error27-min

LDAP server unreachable:

ns_ldap_simple_bind ldap_simple_bind :Can’t contact LDAP server28-min

send_reject_with_code Rejecting with error code 4001

LDAP bind credentials missing

In order to perform this operation a successful bind must be completed on the connection.

Make sure the credentials of the LDAP bind account on the LDAP profile are not missing.

LDAP EAGAIN returns etc.

If you get any of the below types of log text, and ultimately LDAP authentication is not working, recreate your LDAP server object on NetScaler and try again.

User complexity failure on password change:

If using LDAPS with Allow password change enabled, a user is prompted to change their password if it expires or is set to change on first logon. The new password specified does not meet the Active Directory set complexity requirements for passwords.

ns_ldap_check_result LDAP action failed (error 19): Constraint violation29-min

receive_ldap_passwd_modify_event Password complexity violation30-min

RADIUS authentication failure:

process_radius Received RAD_ACCESS_REJECT for: username.

process_radius Sending reject.

send_reject with_code Rejecting with error code 4001.

On NetScaler v11+ you can also navigate to Authentication -> Logs. This shows an output of all the authentication attempts including failure reasons. Data here is pulled from the ns.log file located in the /var/log/ directory.31-min

Finally NISC (NetScaler Insight Center) and NMAS (NetScaler Management and Analytics System) also with Gateway Insight records user authentication attempts and records failures including failure reasons. Navigate to Analytics -> Gateway Insight -> Authentication on NMAS.32-min

Authentication attempts can be drilled down to specific users for easier troubleshooting towards specific user authentication by clicking on the users name.

33-min

See https://jgspiers.com/ldap-load-balancing-citrix-netscaler/ for LDAP Load Balancing through NetScaler.

To enable Enhanced Authentication Feedback on NetScaler to provide more meaningful logon failures to users see https://jgspiers.com/netscaler-enhanced-authentication-feedback/


18 Comments

  • Prasanth

    April 18, 2018

    HI,
    I
    VDA never communicates directly with Citrix license server in Citrix XenApp 7.x environment. Please correct me i’m wrong

    Reply
    • George Spiers

      April 18, 2018

      Correct.

      Reply
  • Anonymous

    July 19, 2018

    Hi George,

    I am trying to add RADIUS server in Netscaler node 1 (primary), the test connection fails to reach Radius server, but when i add in NOde 2( secondary), the connection was successful.

    Thanks,

    Reply
    • George Spiers

      July 20, 2018

      Odd. When you say test connection do you mean testing authentication or using the actual test authentication button on the RADIUS profile? Have you tried taking a traffic trace and reviewing with WireShark to see what is happening to that RADIUS traffic?

      Reply
  • Anonymous

    July 20, 2018

    when configuring Authentication Radius server i.e after entering IP, Port and Secure key, i am hitting test connection button, then its saying , server is not reachable or either is not a valid server, or 1812is not a valid radius port.

    the radius client agent is configured at radius server and the traceroute and telnet from NSIP to Radius server is working fine. from Node two everything is working fine.

    is there any command to trace, when i hit test connection button .. Thank you!

    Reply
    • George Spiers

      July 20, 2018

      Not sure what RADIUS system you are using but if firewalls are not blocking the communication then either the passwords do not match between appliances, or the RADIUS server isn’t configured to expect any traffic from NetScaler.
      An nstrace running whilst pressing the test connection button may give better indication as to what is happening and if any traffic is being blocked. https://jgspiers.com/citrix-netscaler-traffic-capture-using-nstrace-nstcpdump/

      Reply
  • Anonymous

    July 23, 2018

    Thanks George, i did nstrace from both netscalers, from Node 1 (primary) – the request source IP is showing Subnet IP and NSIP as well with ICMP protocol.

    form Node 2- the request source IP is only NSIP and as i said the connection is successful.

    so do we need to configure a Radius agent with subnetIP at radius server? as per citirx NSIP will contact radius server.. bit confusing with my NODE1 🙁 any thoughts would be very helpful. Thank you!

    Reply
    • George Spiers

      July 23, 2018

      It is NSIP by default, as you noticed Node2 was using NSIP and the connection is successful. If you allow Node1 NSIP to communicate with RADIUS then you should be good.
      Have a look inside the packets to see if there are any error messages. If you are not seeing any traffic returnesd from RADIUS server e.g. acknowledgements etc. then it is likely that traffic is being blocked.

      Reply
  • Anonymous

    July 23, 2018

    Hi George,
    Thank you!, as i mentioned, both the nodes allowed with RADIUS ( agent creation) . i could see SNIP is sending traffic to RADIUS when i nstrace from Node1 . from Node 2 only NSIP is sending traffic. is there any way i can attach screen shot or email .

    Reply
    • George Spiers

      July 24, 2018

      Send me an email with the traces from Node1 and Node2. Type in the email your RADIUS IP, your NSIP and SNIP. I’ll take a look.

      Reply
  • Anonymous

    July 24, 2018

    Hi George,

    I emailed the details. Thank you!

    Reply
  • Anonymous

    July 31, 2018

    Hey George,

    Thanks for your help, i really appreciate it.

    As you know the netscalers are not reaching RADIUS, i created a agent with SNIP at RADIUS and the connection was successful.
    I know technically we have to create agent with NSIP . but here i dont know, the SNIP is sending the traffic. with SNIP created the the entry .

    Thanks,
    Srinivas

    Reply
  • Srini

    August 14, 2018

    Hi,

    in our infra, radius policy is bound as primary authentication to the VIP, and when i open the VIP url, i am entering LDAP creds first then its prompting for entrust token (Radius). how was this setup ? technically i should enter Radius code since its bound as parimary auth.

    is the LDAP bound is hidden or something to the VIP.

    Thank you!

    Reply
    • George Spiers

      August 15, 2018

      Maybe the RADIUS server is acceptng the LDAP credentials. Some RADIUS systems do that, and authenticate to AD on your behalf before sending you a RADIUS challenge.
      If you only have one authentication policy which is RADIUS as primary, that is likely to be what is happening.

      Reply
  • Harry

    December 9, 2020

    Hi George,

    I am experiencing issues while trying to “Change password” after successfully logging into the Gateway. If the user has “password change on next log-in” setting enabled, it let’s the user change the password successfully.

    Reply
  • Pingback: http citrix login process diagram com Account Portal Instructions Help Guide - trustne.com

  • Pingback: www.jgspiers.com change login - lgoinbb.xyz

  • Pingback: NetScaler身份认证失败?aaad.debug 来拯救你 – Alonso Blogs

Leave a Reply