Migrating vCenter as a Virtual Machine

January 25th, 2010

The caveats of running vCenter within a VM has been written about to death. So I will mention only what I never found in those posts that is relevant to the environment within which I work:

greater virtual performance over physical

Not every environment is large enough to dedicate a whole 8/12/40/96 GB of RAM Blade to a vCenter. Its simply a waste; the premise on which people dive into virtualisation is that green feeling, saving the environment or the saving the USD.
So businesses tend to ‘donate’ a physical box that will be the nucleus of their virtual offering. Yet this tends to be an underpowered, not-far from decommission door stopper.
As a Virtual Machine, you’re able to utilize the benefit of prompt resizing of the hardware, and at the ability to easily place it on higher-powered machines.

That’s the benefit, although what REALLY happens when you decide to move your vCenter around?

vMotion Mission Statement:

… Storage VMotion lets you relocate virtual machine disk files between and across shared storage locations while maintaining continuous service availability and complete transaction integrity.

To me, that reads – “stuff will happen while we move your data”.

Unfortunately that isn’t always the case. Here’s the scenario.

Host
- DS1
- DS2
- DS3

vCenter was living on DS1 (slow-LUN), and I wanted to put it across to DS2. Since vCenter is very important, it’s continuous service availability is crucial. Read: {perfect candidate for svMotion}.

The message that never went away

What happened next is counterintuitive to the credo. The vSphere client freezes, and no new connections can be made to vCenter. If you start monitoring the datastores (kudos to VMware) the VM is actually being transferred.

Once the migration completed, the first time I had to reconnect the vSphere Client; I wasn’t happy with this outcome. So I attempted a DS2->DS3 (both fast-LUNs). Not only did my connection to vCenter not drop this time, but the whole process was seamless and worked as advertised.

vCenter Migration

I ended up moving the VM in which I house SQL Server (vCenterDB) in the same fashion to the faster LUNS, and everything proceeded to operate as designed.

Lesson learnt: get decent storage.

Inconsistency in ESX consuming large LUNs

January 24th, 2010

… go with me here.

You are going to have black caviar (highly recommended). You were provided with 50 grams; you figure you can fit at most 45 grams onto the piece of that delicious dark-rye. So what do you do? Do you disregard and thus throwout the 5 grams? … Well it’s not important what YOU would do, its important what VMware does!

Mr. ESX tends to look at the excruciatingly expensive, sustenance-providing caviar, and throws out the majority that it can’t handle, and opts for the remains in the hard-to-reach crevices of the jar.

What does this all mean for the geek?

There’s a decrepit limit of 2TB minus 512 bytes for each LUN that you can present to ESX. Anything larger, it has no love for. So if you were to present it with a 4TB LUN, you would naïvely assume that you would get the bastardised version of 2TB and the rest would be lost in the ether. I guess that would be somewhat logical.

Lets try it:

Capacity vs. Available Space

There you have it. Instead of actually using up as much as ESX’ly possible (~2TB) from a LUN that has been allocated, VMware chose to only pick up the left-overs (~500GB).

Re-Installing vCenter with new DSN credentials

January 13th, 2010

If you’re ever in a situation where you’ve moved the vCenter database, and have actually changed the login details for the database that the DSN points to, and you’re using SQL Server. Then upon re-installing vCenter, you will be greeted with:

“Database job [Past day stats rollupvcenter] was created by another user. Please use the same user to setup your DSN or remove the job. ODBC Error: [Microsoft][SQL Native Client][SQL Server]The specified @job_name (‘Past Day stats rollupvcenter’) does not exist.

When vCenter is first installed, it schedules jobs with the help of the system DB – MSDB. What’s left to do, is just remove the jobs created by the previous dbo of your vCenter data. You achieve this by first listing what jobs are created on the SQL Server.


SELECT [job_id],
[originating_server_id],
[name],
[enabled],
[description],
[start_step_id],
[category_id],
[owner_sid],
[notify_level_eventlog],
[notify_level_email],
[notify_level_netsend],
[notify_level_page],
[notify_email_operator_id],
[notify_netsend_operator_id],
[notify_page_operator_id],
[delete_level],
[date_created],
[date_modified],
[version_number]
FROM [msdb].[dbo].[sysjobs]

As you can see, four scheduled operations exist. Thankfully you don’t have to worry about just clearing this table, as there’s a stored procedure that comes within MSDB -> sp_delete_job

Run it for each of the jobs, and you’ll be ready to continue installing vCenter.

SOAP wrapper for LabManager written in PHP

September 30th, 2009

LabManager is a great product for when you have to deal with provisioning of images in a hands-off fashion from the IT team. If you’re already a vSphere shop, users don’t need access to a vSphere client and can manage the lifecycle of their machines through the browser (Windows only though for the console).

For what I’m building, I needed access to the API provided by VMware, and unfortunately, it is only provided as a SOAP interface. The documentation is a given, although it is of quite a poor standard. Since case-sensitivity is a must, not to mention correct parameters.

So how do we go about creating an easy to use PHP page which would consume the .NET WebService?

<?php

$namespace = "http://vmware.com/labmanager";

$soap_dat["username"] = "user";
$soap_dat["password"] = "password";
$soap_dat["organizationname"] = "org";
$soap_dat["workspacename"] = "cake";

$client = new SoapClient("https://server_name/LabManager/SOAP/LabManager.asmx?wsdl");

$header = new SOAPHeader($namespace, 'AuthenticationHeader', $soap_dat);

$client->__setSoapHeaders($header);

$config["configurationType"] = "1";

$result = $client->ListConfigurations($config);

print_r($result);

?>

After using some of the methods, I decided to write a wrapper for the LabManager’s API, so I could use it within other applications we develop inhouse. You can then call this directly from the command line, and integrate it with your bash/perl – and not have to re-invent the wheel.

As such – I present to you phLabManager – “Simple, lightweight wrapper for LabManager 4.0 SOAP API” written in PHP.

Once you’ve grabbed the labmanager.php, you can write quite succinct calls directly to the methods, without worrying about the implementation. There’s an example on usage available.

I haven’t finished implementing all the methods, and in the coming days will endeavour to do so, alternatively – you can do it, and commit into the tree. It would be great to get some help in optimizing my rough implementation.

vSphere Client – Unexpected end of file has occurred

August 23rd, 2009

In the unfortunate scenario that the machine you use to run your vSphere client ever crashes (unfortunately VMware doesn’t make an OSX Client), you may wish to start up the client, and click the performance tab. You may be greeted with the following when selecting the “Performance” tab:

Unexpected end of file has occurred. The following elements are not closed

Unexpected end of file has occurred. The following elements are not closed

This is due to a file which hosts all the chart settings (an XML-based key value pair collection) becomes corrupt, and is actually cut off.

Its quite easily found at : c:\Documents and Settings\user_name\Application Data\VMware\server_name-charts.xml

From here, you have several choices:

  • revert to a backup
  • delete it
  • clear it (just place <ChartSettings /> inside of it)

That’s all there is to it. Happy vSphere’ing

CLI Wizard for WebSphere Portal CumulativeFix Management

August 8th, 2009

The unfortunate side-effect of any enterprise-grade software is that it is difficult to manage. There are at least two reasons that I see for this:

  1. Training + certification revenue
  2. Tools that are clobbered together over years of being driven by big-customer requests/demands, and not a defined architectural vision.

One such example is installing and uninstalling cumulative fixes for WebSphere. My first step was to download the PUI, followed by its extraction and then I had to load the extremely cumbersome and slow Java GUI. Then to use it, I either dropped into X11 forwarding the session from my Mac, or for my unfortunate colleagues – they have to start a VNC session first and then use the client. Stupid (not the colleagues).

So I came up with something that wrapped the updatePortal.sh in between a purpose built interface, accessible from the command line. I bring you, the CLI Wizard for Portal CF’s.

Lets disassemble the necessary steps first, and start with uninstalling.

Portal places all the CF definitions into PortalServer/version/PK{number}.efix, these are just xml files which describe the PK. What we need is the PK number and the description.

After a few lines of bash, on a server with several CF’s installed the output will look similar to this:

PK_list_installed

Not very useful though, yet. So we now allow for actually selecting one of the CF’s through the great bash select statement.

Having the PK number, we are easily able to cut/awk and pass this to the trusty updatePortal.sh to do its uninstalling deed.

The installation is much the same, although you must first obtain the PK jar. In the office we have a build server that spits them out in (in zips with the readme). So the script first of all gives a listing of available CF’s to install, and then when you choose the appropriate zip, it will download it and install.

You can download the complete script, and let me know if you have any issues. I thought of distributing this on Google Code, so others that deal with Portal can extend it, as Portal’s Java GUI leaves much to be desired.

How to virtualize your office – without their knowledge

July 25th, 2009

People complain that they don’t have enough machines, and when they need servers, they’re available. Two of the problems that were prevalent in my workplace. I set out to attempt to rectify this by firstly measuring the utilization rate on some of our busiest servers.

Instead of installing and managing software that monitored CPU/Memory/HDD/Network utilization,  I merely converted the whole machine to a virtual one – and offered it as a replacement under the same IP to all the users. This was done with a quick P2V conversion of the server into a VMware image pushed to a dedicated IBM Blade server running ESX 4.0 – which had far more grunt than the originating server. A few tweaks on the DHCP server to assign the right IP to the new born, and we were away. NOTE: the usage shown is of the machine running on the Blade, not on its original hardware; I’ll come back to this later in the post.

CPU Usage

CPU Usage - 1 Machine

Disk Usage

Disk Usage - 1 Machine

Memory Usage

Memory Usage - 1 Machine

Before I continue, I’ll just tell you more about the software stack that is deployed on this specific machine. There is WebSphere + AppServer as well as a local instance of DB2. Mostly as you can see, its Java based, so memory usage will be fairly straight forward as it all gets eaten up on app start, and the variations that we see are from DB2. (I could use a remote DB and alleviate the memory variation all-together, which is something I will be doing with all deployments from now on).

Firstly now, the CPU Usage, throughout the day with constant use of the machine, averaged out to ~5%. Am certain this isn’t unheard of, for those that actual look at their data centres and the individual/un-optimized servers. Disk usage, aligns quite nicely with the Memory footprint, which once again points to DB2 doing its job.

So what can I improve here? -  I can do several things:

  1. remove local DB, and deploy into a contained VM cluster, or just point it at a dedicated machine (which will also be a VM)
  2. if I offload the DB, I can increase the heap size within WebSphere for increased startup time, and overall performance
  3. put more machines onto this single Blade

Realising the first two points only after looking at the graphs (to write this post), I still haven’t implemented them. I did though – do the third.

CPU Usage - 3 Machines

CPU Usage - 3 Machines

Disk Usage - 3 Machines

Disk Usage - 3 Machines

Memory Usage - 3 Machines

Memory Usage - 3 Machines

Very quickly you realise that although there’s an abundance of processing power, there’s just not enough RAM on the host blade. The ration of Ghz to Gbyte’s of RAM is about 3:1, instead it should be at the very least 1:1. Keep this a consideration next time have to put a new purchase order for more machines.

In the beginning of the post, I mentioned that the current blade host although more powerful than the original home of this software stack, it is far more scalable, and that is one of the reasons why I wanted to transfer and consolidate the numerous x345’s we have onto several blades. Most of the x345’s that were hosting servers are now turned off, the remaining few, I’m building a storage cluster with, to act as a iSCSI target server for the ESX’ed Blades.

What was satisfying was some people commenting on how fast the servers responded, little did they know that they were virtualized. Lesson: don’t buy 20 cheap and nasty servers, but a blade centre and start populating it as your budget permits.

Is OSX going to be the first mainstream Cloud OS?

May 26th, 2009

xserve.jpg

The logical progression of the thin-client cloud movement is that it comes to the masses. We’ve seen numerous advances in Virtualisation technology purchases from the Citrix mob with Xen, VMWare’s long pedigree, and the new entrant Sun’s VirtualBox. All these are great, but are missing the point of mainstream adoption outside the enterprise. This can only occur when virtualization is no longer about just server consolidation and cycle saving.

Apple is building a server-farm, can this be a prelude and foundation of what is to become the delivery mechanism for the VOS (Virtual OS) ? My prediction is that within 2 years, not only will there be a smaller device that you are able to take around in the form of a tablet, but more importantly is its integration with your persistent presence.

A simple scenario is you working on a document or watching a movie on your Mac at home, after which you must leave. Without turning anything off, you merely take your tablet/light-weight computing unit, and proceed on your trip. Once on a bus, you will be able to resume your document editing, movie watching experience exactly where you left off.

Current core strengths within Apple do not include OS abstraction and Virtualization (I’m not counting Rosetta, as that wasn’t developed inhouse), so Apple’s next purchase should be a player in Virtualization delivery, or at least see a partnership emerge – Citrix – wink*wink*nudge*

VMWare Converter fails to publish a split-sparse image to ESX

April 20th, 2009

“FAILED: The object or item referred to could not be found” is the extremely helpful message that VMWare converter displays when it fails.
vmware_conversion_status.png

Digging deeper, within the logs we can see that there are multiple instances of

“Warning: failed to create directory” and “Warning: failed to clone directory tree”.

The simple work-around is to convert the vmdk disk to a monolithic-sparse.

You can do this by issuing:

$ vmware-vdiskmanager -r original.vmdk -t 0 destination.vmdk

This will clone the disk image as well as modify it from being composed of 2GB files for the entirety of your VM to a single vmdk referred to as a ‘monolithic-sparse’ (merely referring to the fact that it will increase in size automatically to encompass the the VM partition).

After completing the cloning process, you should have no problems in restarting the conversion process, and it should complete as advertised.

Empty iPhone Emails – Solved

August 25th, 2008

NOTE: iPhones Firmware update 2.1, seems to have gotten rid of the problem. If now only Apple would fix the damn calendar bug

Have you recently sent an email from your beloved iPhone and have it delivered – empty? Then when you look at the sent on your phone you get the lovely: “This message has no content”. Read more »