Monthly Archives: September 2014

Lync and DCOM -1007781356 RollbackMoveAway Failures

Problem Overview

Like a few out there, I’ve encountered the dreaded -1007781356 DCOM error recently.  It started when a client notified me that after migrating 8000+ users from one pool to another, that there was a small handful, about 150, that wouldn’t move.  Most users would show the following error: “Distributed Component Object Model (DCOM) operation begin move away failed.” with “RollbackMoveAway failed -1007781356”.

DComNoMove

About 20 of those users we found had a slightly different error, a DCOM -1007200250.  These errors gave a bit of additional detail in the message: “Distributed Component Object Model (DCOM) operation begin move away failed because user was not found in database.” which can be seen below.

UserNotFound

After some additional investigation, other issues were becoming apparent.  There were issues with the Backup Service which showed itself in an error state.

BackupStateError

Further there were LS Backup Service 4073 errors showing “Microsoft Lync Server 2013, Backup Service user store backup module detected items having pool ownership conflict during import.”  The list of users shown include many of those who wouldn’t move.  This confirmed that we were dealing with a pool ownership conflict, where the user partially exists in multiple pool SQL databases.

Pool_Ownership_Conflict

Export-CSUserData was also failing with the error.   “Export-CsUserData : Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.  This failure occured while attempting to connect to the Principle server.”

Initial Solution Attempt

While I wasn’t entirely sure that all of these issues were related, the timeline in which they started manifesting lined up closely.

The first thing I did?  I’m not ashamed to admit that I went for the search engines, no use reinventing the wheel if this is a common problem.  Unfortunately, I didn’t find too much.  There were similar errors with resolutions that didn’t seem to line up as this was only affecting a handful of users.  Further, any approach I take has to be taken with extreme care as this environment has tens of thousands of users.  The blog I encountered with the most helpful information was John Cook’s.

http://johnacook.wordpress.com/2014/05/08/pool-ownership-conflict-moving-users-between-lync-pools/

With an identical error and symptoms, he was able to contact Microsoft PSS who had of a tool that could resolve the issue.  Before we headed down that path, I wanted to see if there were any additional workarounds I could get at for our environment.

Taking a cue from John’s blog, the VerifyUserDataReplication.exe tool from the Lync Resource Kit gave me an output of the identical user set found in the LS Backup Service 4073 errors, and it also lined up nicely with the users who refused to move.

We had a reasonably good backup of user data despite the Export-CSUserData timeouts, and located one of the users who hadn’t logged in for quite some time to use as a guinea pig.  Using our guinea pig account, we were able to move the user with the command:

Move-CsUser -identity <Identity> -Target -<OtherPool> -Force

The -Force is what got us there, it ignores the user data, which in our case was the issue preventing us from moving the account.  After that, we were able to run an Update-CsUserData to merge the contacts back in for this user from our backup.  The remaining users were scheduled for a forceful move and restore that night.

As a side note, it was comforting to see sharp guys out there such as Flinchböt fighting the same issues at the same time and coming up with the same approach.  http://flinchbot.wordpress.com/2014/09/17/moving-immovable-users/

Almost, but Not Quite

The rest of the users moved successfully accomplishing the initial goal.  However, the LS Backup Service 4073 errors and the VerifyUserDataReplication.exe were still reporting the issue.  A sample of the VerifyUserDataReplication.exe can be seen below.

Info: reading batches served by jpprdl3sql1.adms.lyncfix.com\lync13Tokyo from backup pool.
Info: 6247 batches are returned from deprdl3sql1.adms.lyncfix.com\lync13Berlin.
Info: 116161 items are returned from deprdl3sql1.adms.lyncfix.com\lync13Berlin.
Info: reading batches served by jpprdl3sql1.adms.lyncfix.com\lync13Tokyo from source pool.
Info: 6248 batches are returned from jpprdl3sql1.adms.lyncfix.com\lync13Tokyo.
Info: 116366 items are returned from jpprdl3sql1.adms.lyncfix.com\lync13Tokyo.
Info: comparing batches served by jpprdl3sql1.adms.lyncfix.com\lync13Tokyo in source pool and backup pool.
Error: batch bf36c405-0396-429e-bac3-001dd81d17b6 has item 6abec8cb-5857-4cc4-8c44-dd99d9e47206-urn:hcd:theresa.jerd@lyncfix.com whose partial version 3 in source pool is less than or equal to batch’s partial version 4 in backup pool. It cannot find same item in backup pool.
Error: batch bf36c405-0396-429e-bac3-001dd81d17b6 has item 6abec8cb-5857-4cc4-8c44-dd99d9e47206-urn:lcd:theresa.jerd@lyncfix.com whose partial version 3 in source pool is less than or equal to batch’s partial version 4 in backup pool. It cannot find same item in backup pool.

Clearly, the move got us past the initial DCOM error by ignoring the database issue, but it didn’t clean up the offending database records.  The next approach was to completely disable a user and try again.  If we can’t resolve the issue with a forced move, surely removing the user from Lync would do the trick.  We figured we could always recreate the user later if this approach worked.  We went back to our guinea pig account and ran Disable-CsUser.  This command deletes all the attribute information related to Lync Server from an Active Directory user account.

Even with the account “removed” from Lync, the issue still persisted.  The user still showed up in the LS Backup Service 4073 errors and the VerifyUserDataReplication.exe output.  We re-enabled the user and reimported the user data again from our backup.  It makes sense that this approach didn’t work, as it only removes the Active Directory attributes, there is no backend database cleanup.  But if the forced move and the delete doesn’t do it, what will?

Final Answer

The final answer was simple, now that the DCOM error was resolved due to the move-csuser -force command, we could freely move the users back to the old pool, and then move them back again to the final destination.  Success!  Using this method, the pool conflict error was resolved, the username was removed from the event log, and VerifyUserDataReplication output no longer reported the account as an issue.

Moving the previously forced-moved users back to their old pool, then back again to their new home cleaned up the database enough that not only did our events disappear, but export-csuserdata stopped experiencing it’s timeouts and the backup service error state went back into a normal state.  We’re now back to a healthy state.

A Special Thanks

I’d like to extend a special thanks to John A Cook and Flinchböt for sharing their experiences and letting me talk through mine.  Please feel free to reach out to me here or on twitter @CAnthonyCaragol if you’re experiencing issues of your own.

 

 

Lync and Automatic Off-Hook Dialing

Overview

There are times when you need a phone set up to automatically dial a number when the handset it picked up.  This may be a reception or kiosk phone, or this may be an emergency phone.  If you’re responding to a voice proposal, these emergency phones may have many names, “Point of Emergency”, “SOS”, “Point of Rescue”, “Area of Rescue”.  All they typically do is dial the emergency services number as soon as someone grabs the handset.

Lync Phone Edition does not support this feature at this time.  In the past with Lync it needed to be handled by using an analog phone and IP gateway to perform the off hook dialing.  However, with all the Lync qualified phones out there, it is now easily done and I thought I’d take a moment to write up a post on accomplishing this with a few common models.  For each model, we’ll configure the phone to automatically dial extension 1804 when the handset is picked up with no buttons pressed.  1804 could easily be 911, or 999 or whatever emergency number is in use in your area.

Note: If you’re reading this, and you have a better method, please let me know in the comments or on Twitter at @CAnthonyCaragol

Polycom VVX

There are many ways to provision a Polycom VVX phone, and you may prefer another way than presented, however in many situations these configurations are handled on a one-off basis so I’ve chosen to modify the config file directly from the web interface.   To do so, let’s log into the web interface of our VVX phone.  In the picture below I’m using the UC 5.1.2.1801 firmware with a VVX300, but this works with other versions of the firmware and other VVX phones such as the VVX500.  Navigate to Utilities -> Import & Export Configuration, select Web in the Export Configuration dropdown and click Export.

vvxb

We’ve got two lines we’re going to add, call.autoOffHook.x.enabled and call.autoOffHook.x.contact.  Replace the “x” with the line number (with Lync usually 1) and insert into your downloaded config.  Below is an example of my modified configuration.  The 1804 you see is the number I wish to dial automatically.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!-- Application SIP Amazon 5.1.2.1801 20-Aug-14 14:36 -->
<!-- Created 17-09-2014 17:40 -->
<PHONE_CONFIG>
     <WEB
          call.autoOffHook.1.contact="1804"
          call.autoOffHook.1.enabled="1"
     />
</PHONE_CONFIG>

Save the configuration and import it back into the phone.

vvx2b

Your phone will reboot, and log back in.  Once back up, pick up the handset and it should be dialing.

snom UC edition

With snom, it’s a bit easier.  In this example, I’m running the SIP 8.8.2.21 firmware.  You’ll need to enable “Administrator Mode” to see these settings.  Log into the web interface and navigate to Action URL Settings.  In the “On Offhook:” area, enter https://127.0.0.1/command.htm?number=number_to_dial.  Click Apply.  You should be able to lift the handset and hear it dialing right away.  In our example below, you can see that I’ve replaced number_to_dial with 1804 again.

snom

There are often multiple ways to accomplish the same goal.  Another option with the snom is to go to Advanced -> Behavior and set the Auto Dial to “after 2 sec” and the Preselection Prefix to your number.  Click Apply and you’re good to go!

snom2 snom3

AudioCodes 420HD, 430HD, 440HD

AudioCodes again makes it easy.  In this example, I’m using an AudioCodes 440HD running the UC 2.0.5.6 firmware, however this can also be done with other models as well.  Connect to the web interface and navigate to Voice Over IP -> Dialing.  Set the Activate parameted to Enable to see additional parameters.  No, set the timeout to 0 to have the number dialed immediately (or a short delay if you want to ensure the handset is near the caller’s ear before to remote end answers), and enter the number to dial.  Click the submit button and test!

AudioCodesb

If you have a different or easier method, or find a mistake, please let me know. Thank you for reading!

AudioCodes Mediant Virtual Edition SBC Installation

Note: It has come to my attention that out of the box, the AudioCodes Mediant Virtual Edition comes with two ESBC licenses that will support two concurrent calls and is available as a download from their support site.  This makes it ideal for a Microsoft Lync 2010 or 2013 lab that would mimic a production environment without additional cost.  This article steps you through the basic installation of the SBC.

One of the most recent AudioCodes offerings is the Mediant Virtual Edition.  This software is aimed at firms who have chosen to virtualize their infrastructure as much as possible and strictly need an SBC (no PRI or other TDM modules).  It runs software that is very similar to the physical Mediant series you may be used to running but has the additional resiliency advantages of living on within a virtual environment.  At the time of this writing, the latest maintenance release (we’re in 6.8 right now) does not support transcoding.  This means that if we’re going to send G.711 to Lync, we’re going to need to receive G.711.

Since these SBCs deal with real time communications, much like Lync, you’ll want to follow virtualization best practices.  The documentation from AudioCodes suggests “Each vCPU must correspond to a physical CPU core fully reserved for the SBC VM.”  The supported Hypervisors include VMware ESXi version 5.1 or later and Microsoft Hyper-V Server 2012 R2 or later.  For our purposes today, I’ll be showing you the install from HyperV.  We’ll avoid the virtual machine setup and jump right into the SBC install.

 

 Low Capacity SpecificationHigh Capacity Specification
Virtual CPU1 virtual CPU4 virtual CPUs
Memory2 GB4 GB
Disk10 GB10 GB
Network Interfaces2 vNICs are recommended, a third can be added for HA configurations2 vNICs are recommended, a third can be added for HA configurations

The software is available as an OVF for VMWare, a prepackaged virtual machine for HyperV, or as an ISO.  I thought it would be the most interesting to walk through the ISO installation today.

Booting the ISO from our HyperV box shows us this boot screen.  I love the colorful ASCII art, it’s rare that I get to see it these days.  The SBC actually runs on a CentOS installation which you’ll see parts of as the installation occurs.  From the boot screen, hit the Enter key.

setup1

You’ll see processes fly by… be patient.

setup2flyby

You’ll then be dropped at a screen that’s going to prompt you to wipe the disk you created, hopefully this isn’t a shock or a big deal.  Navigate to select the Re-initialize box and hit Enter.

step3_reinitiialize_centos

The Linux packages will install, but it doesn’t take long.  It will feel almost instant if you’re used to watching SQL Express instances install for Lync.

step4

Once our SBC installation is complete, make sure to eject your virtual DVD or else we’ll be taken right back to the top screen.  Once ejected, click the Reboot button.

step5reboot_ejectDVD

After a quick boot, up pops our GRUB loader.  Hit enter or let it select it as a default.

Step6

The next screen should be familiar if you’ve ever used a serial cable or the CLI of an AudioCodes Mediant before.  To log in, we’ll use the default user name Admin and password Admin.  Please note that I’ve capitalized the A intentionally as the username and password are both case sensitive.

step7login

Once we’re in, we can dig and look at the default IP address.  To so do, type in “show voip interface network”.  If you want to see what commands are available to you, you can download the CLI Reference Guide or type ?.  Many of the commands you’ll see may be familiar if you’re familiar with the command line interface on popular routers and switches.

step8_configip1

The default IP is 192.168.0.1 with a subnet mask of 255.255.255.0 on the default VLAN.  You can either set another machine up with an IP on this network and connect right away, or change the IP now through the CLI.  To change the IP from the CLI, we’ll need to enter enable mode by typing “enable”.  Now, we need to get into config mode.  Since the network settings are found in the VoIP section of a Mediant, we’ll type “configure voip”.  We’ll switch to our default network interface by typing “interface network-if 0”.  Now we can use the ip-address, prefix-length, and gateway commands to set our IP, subnet, and default route respectively.  This can be seen in the image below.  Once finished, type activate and hit Enter.  You’ll see a note that the configuration won’t take effect until we reset.  To reset, type “exit” to get out of config mode, then type “reload now”.

step11

That’s it for getting it accessible from the network.  In the screenshot below you can see us using the web interface.  If you’re a partner you can use the AudioCodes SBC Wizard to configure it, or set it up from scratch.

step12

Here’s the part I love for the lab: It comes pre-licensed with 2 SBC sessions.  That will allow two concurrent calls through the device without additional purchase or licensing.  You can then connect this to a trunking provider like Intelepeer or FlowRoute for inexpensive calling over the Internet into Lync!

step13b

We’ll stop here because we have our SBC up and running.  The configuration is up to you, though if there’s interest I’ll do a basic walkthrough blog of that as well.  If you’re new to AudioCodes, they have remote implementation services available as well for a no-fear installation.  Please let me know if you have any comments, questions, or corrections and thanks for reading!

This Just In: Lync 2013 Client Update for September 2014

The Lync 2013 Client has an update available that was released today, September 9th.

Some of the issues resolved:

  • When you sign in to Lync 2013 by using an Office 365 account, Lync 2013 prompts you for Open Authorization (OAuth) credentials.
  • Bad password count is incremented when Lync 2013 VDI plug-in pairs with a Lync 2013 client

The download and more detail of the issues referenced can be found here:

 

Lync: Database Mirroring Has Been Disabled by the Administrator

A client sent a request out today letting me know that their Lync databases had failed over automatically to their mirror, but they couldn’t get them fully failed back.  They were receiving the error “Database mirroring has been disabled by the administrator for database rgsdyn”.  You can see this error below as I try to fail this single database back.

error1b

Sure enough, connecting to SQL Manager proved that mirroring was disabled.  Oddly, this database was the only one that wasn’t failing over.  As you can see from the screenshot below, the mirror is marked as suspended.  A quick review of the SQL logs showed that a failure two weeks prior kept the primary down for an brief amount of time, this outage caused the mirror to suspend itself.

error2b

In our case the database and logs were healthy enough on both sides that we could restart the mirror.  I used the GUI to navigate open the properties of rgsdyn on the primary copy (which is on our secondary SQL server now), navigate to Mirroring, and click the Resume button.

error3

Mirroring went right back to a healthy state.

error4

And I was able to re-run my command without issue to bring all copies back to our main SQL server without error.

error5

For those who aren’t so lucky, mirroring may need to be rebuilt for the database in question.  Of course the real issue at hand is why the virtualization environment isn’t more stable.