Very quick post for this morning
Show your appreciation for SysAdmin - http://www.sysadminday.com
Lots more here
Very quick post for this morning
Show your appreciation for SysAdmin - http://www.sysadminday.com
Lots more here
And so this started today with a Twitter post. If you know it or not I am a big enthusiast of trying to install ESX on all kinds of hardware - especially Whiteboxes that are not on the HCL. I have tested it on a number of HP, Dell and IBM desktops. The great thing about this is - it mostly works, it is completely not supported, but a lot of fun to do. But to lug around with a desktop to present demos, is not always the most convenient thing in the world - to put it mildly.
What-if (or as you would say it in Powershell -whatif) you could create a system, that has a full demonstration environment of multiple VM's - and all of this on your LAPTOP!
So I looked around on the Web and found a few mentions of people who have done this. What kind of problems they ran into and what was possible or not. The consensus is to Install a Base OS, Workstation on top of that, ESX as a VM, and then VM's onto that ESX VM. The consensus about this as well was "IT IS AS SLOW AS A TORTISE!". ESX on bare metal should be much much faster,
So my adventure started with this:
The laptop:
Started out in the BIOS, enabled Intel VT
SATA was set as AHCI
And off we go
Install Screen
Recognizes the Disk
And 4 minutes later
All hardware detected out of the box - Network card included
Next was to connect to the laptop with the VI client.
Now all I have to do is find out why I cannot power on a machine. Every time I started a VM the laptop froze - completely! Hard reboot and the machine came back up OK but the VM was no longer registered.
Have to look into that further
Hope you enjoyed the ride.Yes disconnected environments do exist! I mean completely and totally disconnected.
NO INTERNET!!!
Well I had one of those today. My customer has a network which is completely and physically disconnected from the corporate LAN and therefore also not connected to the internet. This because of the nature of the information that is on this secluded network, that no option for anything to go in or out over the wire.
All fine an Dandy! Installed an new ESX4i machine there today. I then wanted to install the new VI client on the users PC. Pretty straight forward - or so you would think..
Opened up the web browser and pointed it to https://ESX-HOST/client/VMware-viclient.exe and ran the exe file.
Next -> Next -> Next -> skipped the host Update utility, Waited, waited, waited
and then ………… BOINK!!!!
Installation failed …….. returned error code 1603. And of course no Vi client.
Hmmm. Maybe something was wrong with the .net Framework on the machine - checked it and all seemed to be kosher.
Tried the installation again, and guess what? Same story! Tried it on another machine - you guessed right - Same story!
I love a challenge and solving puzzles - so this was one for me :)
I unpacked the VMware-viclient.exe and received this
So you would think that the package has all the goodies it needs in order to install. Nope..
Looking into the netfx.log which was located in the %TEMP% directory i noticed that during the installation of the .Net Framework - the Installation was looking for some file on the internet and in a local path. Internet of course would not work here - remember? Disconnected network! - and local path did not have the file either.
Looked again at the folder sizes - 2.4MB seems a bit small don't you think? Microsoft offers Microsoft .NET Framework 3.0 Service Pack 1 available here - but again 2.4MB in size. So I gathered I need the redistributable package (which should include all that I need) - again Microsoft .NET Framework 3.0 Redistributable Package here (this time a 50MB file). Got both files, moved them to my USB key, and tried the installation of .Net Framework - and guess what?
Exactly the same story!!! failed installation - looking in the logs I still see that it still wanted something from the internet.
Got fed up with this and downloaded the redistributable package of Microsoft .NET Framework 3.5 Service Pack 1 from here (this time a full package of 230MB), and moved it also onto my USB key.
Fired up the installer - click click - next next, and went to get my self a cup of water (sorry don't like coffee)
5 minutes (or so) later .Net was installed.
Started the VMware-viclient.exe.
Next -> Next -> Next -> skipped the host Update utility, waited, and then ………… it went onto the next stage of installing the Visual J# 2.0 and then waited, waited, waited and it completed the installation!!(which makes me very happy :) )
Lessons learned from this episode:
Hope you enjoyed the ride!
Well ok.. This could be taken the wrong way (and all of your with the dirty minds should be ashamed of yourselves - ha ha). On one of my previous posts - How Much Ram per Host - a.k.a Lego - I gave a hypothetical scenario of 40 1 vCPU VM's on a single host as opposed to 80 VM's on one host. There was one thing I neglected to mention, and because of a issue with a client this week, I feel it is important to point out.
CPU Contention. For those of you who do not know what the issue is about, a brief explanation. If you have too many VM's competing for CPU resources to work, then your VM's will stop behaving and start to crawl.
So here was the story - a client called me with an issue, all his VM's had started to crawl - EVERYTHING was running slowly!
Troubleshooting walkthrough:
14:21:34 up 2 days, 20:57, 1 user, load average: 1.06, 0.92, 0.75
286 processes: 284 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.9% 0.0% 0.0% 0.0% 0.0% 45.8% 94.1%
Mem: 268548k av, 256560k used, 11988k free, 0k shrd, 21432k buff
189028k actv, 29240k in_d, 3232k in_c
Swap: 1638620k av, 251022k used, 1541988k free 74068k cached
If you notice on the last line
Swap: 1638620k av, 251022k used, 1541988k free 74068k cached
Why was it swapping - that is not normal. Quick check on the Vi Client how much RAM was allocated,
So there was only 272 (default) allocated, someone had done the proper work of creating the SWAP of 1600MB (double the max. of 800) - well done! - but had not restarted the host! So effectively the host was still set for 272. Now of course the load on the machine high enough causing the host to run out of RAM. anything that was done on the host was working slowly
Vmotioned the VM's off and restarted the host which cam back with the full amount this time
Swap: 1638588k av, 0k used, 1638588k free 155924k cached
Ahh much better - no more swapping. Vmotioned the machines back, and at a certain stage all VM's started to crawl again.
Looked into top again
CPU states: cpu user nice system irq softirq iowait idle
total 0.3% 0.0% 0.0% 0.0% 0.9% 38.9% 59.6%
Whoa! that is also extremely high!
ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY %IDLE %OVRLP %CSTP %MLMTD
21 21 MEM_ABC_STB_ 5 49.84 50.08 0.04 393.77 54.77 0.00 0.34 0.00 51.17
There were something like 10 VM's with $RDY times of over 10% constantly.
Here you have a perfect case of CPU contention. The host is a Dual Quad x5320. Machine was running 44 VM's
Ratio of vm's per core is high - but achievable. I then looked to to see what the amount of vCPU's there were on the host, approximately 10 vm's had 2 or more vCPU's.
This brought the ratio of vCPU's per core to 6.75 vCPU's per core. And this is what was killing the host.
Even though the ratio of vm:core was 5.5:1 the vCPU:core ratio was much higher and therefore causing the contention throughout the server.
Of course the client did not understand why all of these VM's should not be configured with anything less than 2 vCPU's - "because that is what you get with any desktop computer.."
It took an incident like this for the client to understand that there is no reason to configure the machine with more than 1 vCPU unless it really needs (and knows how) to use it.
We bought all the machines back down to 1 vCPU and
ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY
128 128 RPM_Tester_Arie 5 4.70 4.70 0.03 487.45 7.35
118 118 RHEL4_6 5 5.33 5.37 0.00 488.44 5.67
108 108 STBinteg3 5 1.68 1.70 0.00 495.00 2.78
112 112 STBinteg2 5 11.20 11.25 0.00 485.50 2.74
21 21 MEM_ABC_STB_ 5 6.90 6.92 0.02 490.19 2.35
And all was back to normal!
Lessons learned from this episode:
Invaluable resources for troubleshooting performance:
Checking for resource starvation of the ESX Server service console
CPU Performance Analysis and Monitoring
Hope you enjoyed the ride..
No I am not offering one - VMware is.
Hot off the press from Twitter - starts now until Midnight - July 24, 2009.
The idea is for you to register for the conference and therefore become eligible to win the pass for free.
If you read the fine print though on the Terms and Conditions you will find that there is a shorter route.
Good luck!!
One of the little known features but, at least I think so, the coolest gems in the last few versions of VMware Workstation is Unity Mode.
Access applications within virtual machines as if they were part of the host operating system desktop with “Unity” view
Two perfect use cases:
A short demo about Unity
And yeah yeah, I know. Time to change the Messenger client, I hear ya!
I was invited (amongst a good number of others that were in the Beta) to sit the Beta Exam. I have decided not to take the opportunity. Only two days left by the way.
Why you should?
Why you should not?
Personally none of the cons mentioned above were the reason for my decision. I will not be taking it because the only VUE testing center in Israel that I could schedule the exam was, available only on one date, three hours drive away from where I live/work, and the slot was at 08.30 in the morning. So I will pass. Pity, but when the exam becomes available, I will definitely book a more suitable slot.
Thank you VMware anyway, for giving me the opportunity though.
I was waiting for these to come in they have arrived, and I think that you all could benefit from these presentations.
vSphere, What's New? - Technical Overview - Ofir Zamir (Team Leader SEs, VMware Israel)
and
vSphere Upgrade and Best Practices - Ben Hagai (VMUG Leader) and Yaniv Weinberg (
Senior Consultant at VMware)
Good presentation from all three of them. Enjoy!
I started to read the sample chapters that Scott Lowe released from his upcoming book, and one of the parts were about the subject of scaling up vs. scaling out.
A slight bit more of an explanation as to what I mean by this. Should I buy bigger more monstrous servers, or a greater number of smaller servers?
Let us take a sample case study. We have an environment that has sized the following:
On this hardware an organization has sized their server's capacity as:
The estimate of 40 Virtual Machines per host is pretty conservative, but for arguments sake let's say that is the requirements that came from the client. The projected amount of VM's - up to 200.
Which hardware should be used to to host these virtual machines? I am not talking about if it should be a Blade or a Rack mount, and also not which Vendor, IBM,HP,Dell or other.I am talking about more about what should go into the hardware for each server. And in particular for this post what would be the best amount of RAM per server.
From my experience of my current environment that I manage, the bottleneck we hit first is always RAM. Our servers are performing at 60%-70% utilization of RAM, but only 30%-40% CPU utilization per server. And from what I have been hearing from the virtualization community - the feeling is generally the same. I wanted to compare what would be the optimal configuration for a server. Each server was a 2U IBM x3650 with 2 72GB Hard disks (for ESX OS), 2 Power supply's, 2 Intel PRO/1000T Dual NIC adapters. Shared Storage is the same for both Servers, so that is not something that I take into the equation here.The only difference between them was the amount of RAM in the servers.All the prices and part numbers are up to date from IBM, done with a tool called the
IBM Standalone Solutions Configuration Tool (SSCT). The tool is updated once/twice a month and is extremely useful in configuring and pricing my servers.
Now the first thing that hit me was - the sheer amount of difference in Server price. I mean I added 100% more RAM to the server, but the price of the server went up by almost 300%. That is because the 8GB chips are so expensive. Now I took building blocks of 40 VM's. To each block I added an ESX Ent. Plus License - an additional cost of $7,000. I assumed that vCenter was already in place, so this was not a factor in my calculations.The table below compares the two servers in blocks of 40 VM's.
Now you can always claim that a server with 80 VM's use a lot more CPU than a server with only 40. But you were paying attention the beginning of the post, the load on a server with 40 VM's was going to be 30-40%, and therefore doubling it would bring the load up to 60-80%. which was well in acceptable limits. As you can see from the table above - the 128GB server came out cheaper on every level that could be compared to the 64GB server. Now of course I am not mentioning the savings in the reduction of the Physical Hardware, rack space, electricity, cooling - we all know the benefits.
So what did I learn from this exercise?
Now of course you can project this kind of calculation on any kind of configuration be it difference in RAM, HBA's, NIC's etc.
Your thoughts and comments are always welcome.
In the current series of posts I am writing on running a vSphere lab on ESX 1 2 and 3, I wanted to set up an NFS shared storage between my 2 ESX hosts to test vMotion.
I ran into an interesting issue which I could hardly find any mention of on the web.
We all know that there are countless amount of posts about vMotion failing at 10% or failing at 90% but not anything about 78%. Well I hate to be picky, but this one was baffling me a bit. I found only only mention of this on the communities, but nothing else.
A bit more detail. I had connected two ESX hosts to an NFS share from Openfiler. There was no problem at all. Both hosts saw the storage. Created machines without any issues on both hosts. Only vMotion would fail – with a very ambiguous error.
Every single time at 78%. At first I though it was because Promiscuous mode was not enabled on the NIC and on the vSwitch, so I changed that to enabled
Did not help.
I tried to get information of of the VMware.log file of the VM, but the only things I could see were these:
Jun 30 13:34:04.615: vmx| Running VMware ESX in a virtual machine or with some other virtualization products is not supported and may result in unpredictable behavior. Do you want to continue?---------------------------------------
So maybe that was the issue? I asked hany_michael and the_crooked_toe if they had any issues with vMotion like this but they did not, even though running a similar environment to mine. This line above was because I was running ESX as a VM, I would get it as well when powering on a VM but it would succeed.
I tried to go through the logs of the VM’s and was not getting more information from it either besides that it could not find the file on the new host.
Turned on verbose logging on the vCenter
Did not get much either.
[2009-06-30 13:56:47.886 03756 error 'App'] [MIGRATE] (1246359385573990) error while tracking VMotion progress (RuntimeFault)
Since this was NFS I started to dive into the vmkernel logs of the ESX hosts at /var/log/vmkernel and found this:
ESX4-1
Jun 30 12:24:40 esx4-2 vmkernel: 0:12:18:19.207 cpu1:4396)WARNING: Swap: vm 4396: 2457: Failed to open swap file '/volumes/c31eba3f-9dca625f/win2k3/win2k3-4aed76bf.vswp': Not found
Jun 30 13:34:05 esx4-2 vmkernel: 0:13:27:43.732 cpu1:4433)WARNING: VMotion: 3414: 1246358033497547 D: Failed to reopen swap on destination: Not found
ESX4-2
Jun 30 13:14:55 esx4-1 vmkernel: 0:14:45:00.526 cpu1:4462)WARNING: Swap: vm 4462: 2457: Failed to open swap file '/volumes/c861a58d-45816333/win2k3_b/win2k3_b-65841149.vswp': Not found
Jun 30 13:14:55 esx4-1 vmkernel: 0:14:45:00.526 cpu1:4462)WARNING: VMotion: 3414: 1246356880465431 D: Failed to reopen swap on destination: Not found
Jun 30 13:14:55 esx4-1 vmkernel: 0:14:45:00.526 cpu1:4462)WARNING: Migrate: 295: 1246356880465431 D: Failed: Not found (0xbad0003)@0x41800da0e0d5
Now why would it not find the swap file? I mean both of the hosts are connected to the same storage.
Or were they??
Look at the log again
ESX4-1 - Failed to open swap file '/volumes/c861a58d-45816333/win2k3/win2k3-4aed76bf.vswp'
ESX4-2 - Failed to open swap file '/volumes/c31eba3f-9dca625f/win2k3/win2k3-4aed76bf.vswp'
See the difference? But how could that be? I remembered that I had run into this issue once before. Let me explain what was happening here. During vMotion the memory state of the VM is transferred from one ESX to the other. In the vmx file of the VM there is a configuration setting as to where the swap is located
sched.swap.derivedName = "/vmfs/volumes/c31eba3f-9dca625f/win2k3/win2k3-4aed76bf.vswp"
When the receiving host is ready to finalize the transfer, it has to take this file to read the swap memory of the VM. This is the only hard-coded path in a VM configuration file, and since the hosts were not seeing the same path, the machine would not migrate.
How did this happen?
When creating the datastores I did one from the GUI,
and one from the command line.
Subtle difference of a / but that is what made all the difference.
I removed the volume from one ESX server, created it again and now the output from both hosts
[root@esx4-1 ~]# ls -la /vmfs/volumes/
total 1028
drwxr-xr-x 1 root root 512 Jul 1 16:27 .
drwxrwxrwt 1 root root 512 Jun 30 14:16 ..
drwxr-xr-t 1 root root 1120 Jun 21 12:16 4a3dfa4c-17137398-672b-000c299e8aed
drwxrwsrwx 1 96 96 4096 Jul 1 00:16 c31eba3f-9dca625f
lrwxr-xr-x 1 root root 35 Jul 1 16:27 Local-ESX4-1 -> 4a3dfa4c-17137398-672b-000c299e8aed
lrwxr-xr-x 1 root root 17 Jul 1 16:27 nfs_fs1 -> c31eba3f-9dca625f
[root@esx4-2 win2k3]# ls -la /vmfs/volumes/
total 1028
drwxr-xr-x 1 root root 512 Jul 1 16:27 .
drwxrwxrwt 1 root root 512 Jun 30 00:07 ..
drwxr-xr-t 1 root root 1120 Jun 21 14:05 4a3e1408-450c30b1-ab94-000c293f26d7
drwxrwsrwx 1 96 96 4096 Jul 1 00:16 c31eba3f-9dca625f
lrwxr-xr-x 1 root root 35 Jul 1 16:27 Local-ESX4-2 -> 4a3e1408-450c30b1-ab94-000c293f26d7
lrwxr-xr-x 1 root root 17 Jul 1 16:27 nfs_fs1 -> c31eba3f-9dca625f
2 lessons I learned from this.
Hope you enjoyed the ride!