Contributors




Download 220.45 Kb.
NameContributors
page2/16
A typeDocumentation
manual-guide.com > manual > Documentation
1   2   3   4   5   6   7   8   9   ...   16

Introduction


Microsoft® Lync Server™ 2010 communications software deployment issues are sometimes manifested as a problem with a particular workload. Perhaps Enterprise Voice calls are failing or federated users cannot connect.

Selecting the appropriate tools for troubleshooting is one of the most important steps in performing root cause analysis. There is a wide range of troubleshooting tools available to the Lync Server administrator. Many of those tools are discussed in Troubleshooting Basics; others are beyond the scope of this book. Troubleshooting Specific Workloads provides you, the Lync Server administrator, with the knowledge necessary to determine root cause by taking advantage of the appropriate troubleshooting tools.

Determining which workloads are affected helps to determine root cause. For details, see the Troubleshooting Basics chapter at http://www.microsoft.com/downloads/en/details.aspx?FamilyID=8c64a1e1-f0b3-479c-a265-e480875c61d8&displaylang=en. This chapter shows Lync Server 2010 administrators several specific examples of a problem in a fictitious Lync Server deployment, and then provides troubleshooting steps to help resolve the problem.

Troubleshooting IM and Presence Workload

Presence Updates are not Reflected for Several Seconds

Problem Description


Microsoft® Lync™ 2010 users at all Contoso, Ltd offices in the United States are reporting that they recently began noticing that manual changes their presence doesn’t take effect immediately. Presence changes usually take effect immediately.

Troubleshooting Process


The Contoso, Ltd administrator checks the geographic region for the users who opened Help Desk service tickets. Contoso, Ltd has separated their Lync Server site topology by region. The cluster of users who are experiencing this update-delay issue indicates that something seems to be wrong with the Front End pool that is servicing the United States region.

He begins by capturing a .uccapilog (log) file trace on Lync 2010 for a user who reported this issue. The administrator navigates to %userprofile%\tracing, and then opens the .uccapilog file in Snooper after reproducing the problem. He confirms that Lync 2010 properly requests that Lync Server update the user’s presence by looking for an outbound SIP SERVICE request that has a content type of application/msrtc-category-publish+xml. He can now right-click the SERVICE request and then select Find Related to view all the requests and responses for the manual presence change. This makes the data less cluttered because the .uccapilog file can contain lots of information.

Note. When a user updates their presence, the request is sent to a Lync Server 2010, Front End Server. It forwards the request to the back end database where the user’s updated presence is stored in the RTCDYN database. When the request is complete, a stored procedure is triggered, causing the Front End Server to inform the user’s contacts (watchers) of the change in presence.

The log file in figure 1 shows the initial SERVICE request sent by Lync 2010 receiving a 200 OK response from the Front End Server. However, the administrator sees that there is a significant delay—25 seconds—between the SERVICE request and the 200 OK response.



Figure 1. 25-second delay between the SERVICE request and the 200 OK response

At this point, the administrator can identify the problem, but needs to begin ruling out the possible root causes as follows:

  • One—a network problem between Lync 2010 and the Front End Servers

  • Two—a network problem between the Front End Servers and the back-end database

  • Three—resource contention issues on the Front End Servers or the back-end database

The administrator then rules out outcome one by running a synthetic transaction cmdlet, Test-CsPresence, (shown in the following code block). This cmdlet is run on one of the Contoso, Ltd Front End Servers in the United States. If the Test-CsPresence cmdlet performance is also poor or fails, a network problem is unlikely between Lync 2010 and the Front End Servers. He directly logs on to one of the Front End Servers of the user’s Front End pool, and then runs the Test-CsPresence cmdlet from the Lync Server Management Shell. The output is shown in the following code block. This cmdlet simulates a user (that is, a publisher) when they are logging on and then publishing their presence. Here’s another user poll for the publisher’s presence, requiring two user accounts.

PS C:\>$cred1 = Get-Credential “contoso\sara”

PS C:\>$cred2 = Get-Credential “contoso\kerim”

PS C:\>Test-CsPresence -TargetFqdn ee-pool.contoso.net -SubscriberSipAddress "sip:sara@contoso.net" -SubscriberCredential $cred1 -PublisherSipAddress "sip:kerim@contoso.net" -PublisherCredential $cred2

TargetFqdn : ee-pool.contoso.net

Result : Failure

Latency : 00:00:00

Error : This operation has timed out.

Diagnosis :

The $cred1 and $cred2 variables are used to store credentials for these two users. The administrator must also enter the Front End pool FQDN by using the TargetFqdn parameter. Supply the following:

  • Subscriber SIP URI

  • Publisher SIP URI

  • Credentials that the cmdlet uses to test the functionality of both updating presence and retrieving presence through these users


He can see that the test has timed out, which rules out a problem between Lync 2010 and the Front End Server. While the administrator doesn’t have enough data to support outcome two or three, he begins to collect Performance Monitor data from the Front End Server (shown in figure 2). This is typically less time-consuming than reading network traces. Initially, he collects and analyzes the following performance counters:

  • LS:USrv - 01 - DBStore\USrv - 002 - Queue Latency (msec)

  • Queue latency is the amount of time (in milliseconds) that it takes for a request to leave the Front End Server’s queue toward the back-end database. In a healthy environment, the Front End Server sustains a value that is less than 100 msec.

  • LS:USrv - 01 - DBStore\USrv - 004 - Sproc Latency (msec)

  • Sproc latency is the amount of time (in milliseconds) that it takes for the Microsoft® SQL Server® data management software database to process the request. This performance value is collected from the time the request leaves the Front End Server queue until that the request returns. In a healthy environment, this performance counter sustains a value that is less than 100 msec.

Note. These two counters are key health indicators for your environment. If these counters are yielding poor performance metrics, it is likely that other workloads are also adversely affected.



Figure -2. Performance Monitor data

In figure 2, the top line represents sproc latency; the bottom line represents queue latency. The y-axis represents time in milliseconds. The administrator can see that that the delay holds steady between 5 and 6 seconds from both counters. This latency would account for at least a delay of 11 or 12 seconds for any database-bound requests.

If queue latency is high and sproc latency is low, this typically indicates that there is a performance problem on the Front End Server. If sproc latency is high and queue latency is to specifications, this typically indicates that there is a SQL Server database performance issue. (For details, see the Notes from the Field—“Lync Server Performance Issues due to a SQL Server Back-End Database” in this chapter. This section covers disk IO, which is the most common culprit.) He notices that both counters are following the same trend almost identically. It is unlikely that the Front End Server and SQL Server database are experiencing the identical delay patterns. He then looks at what they both share in common—the network.

He performs a Network Monitor trace as shown in figure 3 from the Front End Server, and then filters traffic on the IP address of the SQL Server database to see if anything unusual is occurring.



Figure 3. Network Monitor trace

Looking at figure 3, the administrator focuses on one Transmission Control Protocol (TCP) conversation that is taking place between the Front End Server and SQL Server. He sees a large number of TCP retransmissions that are caused by packets not arriving at their destination in a timely fashion. As a result, the data must be re-sent because it has not been acknowledged. The retransmit numbers 1055 and 1054 show the requests going out. These requests have had no reply for 3 seconds. The request is then retransmitted.

Resolution


The administrator confirms with the networking team that there are delays between the Front End Servers and the SQL Server back-end database. The team determines that the switch that the computers that are running SQL Server are connected to is faulty.
1   2   3   4   5   6   7   8   9   ...   16

Related:

Contributors iconContributors

Contributors icon Bibliography  Contributors Bios

Contributors iconContributors IX Acknowledgements XVIII Editor’s note XIX Acronyms and abbreviations XXI

Contributors icon2 mh370: Internet contributors assemble evidence of Hijacking to...




manual




When copying material provide a link © 2017
contacts
manual-guide.com
search