VMworld Reveals: VMware Cloud Foundation (#HBI1432BUR)

Advertise here with BSA At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about VMware Cloud Foundation, which was session HBI1432BUR. For those who want to see the session, you can find it here. This session was presented...

Source

Posted in cloud, cloud foundation, Server, Software Defined, Storage, Various, vcf, Virtual SAN, VMware Cloud Foundation, vmworld, vmworld reveals | Comments Off on VMworld Reveals: VMware Cloud Foundation (#HBI1432BUR)

Top 20 articles for vSAN, July2019

 Status of TLSv1.1/1.2 Enablement and TLSv1.0 Disablement across VMware products  Thick-provisioned VMs on vSAN detected on vSAN-health check “Unexpected VMware Update Manager (VUM) baseline creation failure” error in vSAN Build Recommendation Engine Health Virtual Machines running on VMware vSAN 6.6 and later report guest data consistency concerns following a disk extend operation “Host cannot communicate

The post Top 20 articles for vSAN, July2019 appeared first on VMware Support Insider.

Posted in KB Digest, Top 20 | Comments Off on Top 20 articles for vSAN, July2019

VMworld Reveals: Armed and Ready (ESXi on ARM, #OCTO2944BU)

Advertise here with BSA At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about ESXi on ARM, which was session OCTO2944BU. For those who want to see the session, you can find it here. This session was presented by Andrei...

Source

Posted in ESXi, reveals, Server, tech preview, vmworld, vmworld reveals, vsan, vSphere | Comments Off on VMworld Reveals: Armed and Ready (ESXi on ARM, #OCTO2944BU)

VMworld Reveals: HCI Present and Futures (#HCI2733BU)

Advertise here with BSA At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about HCI / vSAN futures, which was session HCI2733BU. For those who want to see the session, you can find it here. This session was presented...

Source

Posted in cloud foundation, Server, Software Defined, Storage, vcf, Virtual SAN, VMware Cloud Foundation, vmworld, vmworld reveals, vsan | Comments Off on VMworld Reveals: HCI Present and Futures (#HCI2733BU)

VMworld Reveals: Disaster Recovery / Business Continuity enhancements! (#HCI2894BU and #HBI3109BU)

Advertise here with BSA


At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about enhancements in the business continuity/disaster recovery space. There were 2 sessions where futures were discussed, namely HCI2894BU and HBI3109BU. Please note that this is a brief summary of those sessions, and these are discussing a Technical Preview, these features/products may never be released, and these previews do not represent a commitment of any kind, and this feature (or it’s functionality) is subject to change. Now let’s dive into it, what can you expect for disaster recovery in the future?

The first session I watched was HCI2894BU, this was all about Site Recovery Manager. I think the most interesting part is the future support for Virtual Volumes (vVols) for Site Recovery Manager. It may sound like something simple, but it isn’t. When the version of SRM ships that supports vVols keep in mind that your vVol capable storage system also needs to support it. At day 1 HPe Nimble, HPe 3PAR and Pure Storage will support it and Dell EMC and NetApp are actively working on support. The requirements are that the storage system needs to be vVols 2.0 compliant and support VASA 3.0. Before they dove into the vVols implementation, some history was shared and the current implementation. I found it interesting to know that SRM has over 25.000 customers and has protected more than 3.000.000 workloads over the last decade.

First of all, why is this important? Well, vVols already has support for replication, but the orchestration via SRM was not supported. Of course for customers who prefer to use SRM over a self-written solution support for vVols is crucial. vVols is a completely different beast than traditional storage. The communication from SRM does not happen via a so-called storage replication adaptor (SRA) for vVols but happens through the VASA (vSphere APIs for Storage Awareness) provider via SMS (Storage Management Service). VASA is also what vSphere uses to communicate with the vVols storage system. Which is great if you ask me, as it reduces complexity!

So how does this work? Well, when you start SRM and VASA is discovered it will send a discovery request and will receive all data needed to create a protection group and recovery plan:

  • SRM gets all datastores from vCenter
  • SRM gets all fault domains and storage containers
  • SRM gets all replication groups and peers
  • SRM gets all related VMs
  • Gets all storage profiles of all replicated VMs

Of course, this will happen in both locations so a comprehensive recovery plan can be created. When a recovery needs to occur the steps are very similar to a traditional storage system. (See screenshot below.) What I think is important to know is that public API support for SRM and vVols will also be provided in the upcoming release, and in a subsequent PowerCLI release new PowerCLI cmdlets will also be delivered.

Next, a demo was shown of vVols and SRM. This was followed up by both HPe and Pure Storage explaining their vVols and SRM capabilities. That was it for HCI2894BU, let’s switch to HBI3109BU. There was just one announcement in this session, and I am going to limit it to that as I want to avoid duplication of content and explaining basic offerings like VMware Site Recovery (our DR as a Service solution). In short, the problem for some customers with our current DR as a Service offering is the fact that it requires you to have a VMware Cloud on AWS SDDC up and running. In some cases that SDDC may be idle and only used as a replication target. Which for Tier-1 workloads may be acceptable as they will require a low RPO and low RTO. But what about Tier 2 and Tier 3 workloads?

For Tier 2 and Tier 3 workloads having an SDDC up and running may simply too expensive or overkill. For that reason, VMware is investing in a Cost Optimized DR as a Service solution. With the cost-optimized solution, VMs would be replicated to lower-cost cloud storage, without the need for an active SDDC running. Note that this is not a feature that would normally replace the current DRaaS solution, it should complement the current solution. Tier 1 and business-critical apps should use the “Performance Optimized” option and Tier 2 and Tier 3 workloads can potentially use the Cost Optimized option in the near future. No dates around when (and if) this will be released were provided.

And that was it for VMware disaster recovery and business continuity reveals during VMworld 2019. Hope you find this series useful so far!

The post VMworld Reveals: Disaster Recovery / Business Continuity enhancements! (#HCI2894BU and #HBI3109BU) appeared first on Yellow Bricks.

Posted in BC-DR, Disaster Recovery, replication, Server, site recovery manager, srm, vmworld, vmworld reveals, vSphere, vsphere replication | Comments Off on VMworld Reveals: Disaster Recovery / Business Continuity enhancements! (#HCI2894BU and #HBI3109BU)

VMworld Reveals: Disaster Recovery / Business Continuity enhancements! (#HCI2894BU and #HBI3109BU)

Advertise here with BSA


At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about enhancements in the business continuity/disaster recovery space. There were 2 sessions where futures were discussed, namely HCI2894BU and HBI3109BU. Please note that this is a brief summary of those sessions, and these are discussing a Technical Preview, these features/products may never be released, and these previews do not represent a commitment of any kind, and this feature (or it’s functionality) is subject to change. Now let’s dive into it, what can you expect for disaster recovery in the future?

The first session I watched was HCI2894BU, this was all about Site Recovery Manager. I think the most interesting part is the future support for Virtual Volumes (vVols) for Site Recovery Manager. It may sound like something simple, but it isn’t. When the version of SRM ships that supports vVols keep in mind that your vVol capable storage system also needs to support it. At day 1 HPe Nimble, HPe 3PAR and Pure Storage will support it and Dell EMC and NetApp are actively working on support. The requirements are that the storage system needs to be vVols 2.0 compliant and support VASA 3.0. Before they dove into the vVols implementation, some history was shared and the current implementation. I found it interesting to know that SRM has over 25.000 customers and has protected more than 3.000.000 workloads over the last decade.

First of all, why is this important? Well, vVols already has support for replication, but the orchestration via SRM was not supported. Of course for customers who prefer to use SRM over a self-written solution support for vVols is crucial. vVols is a completely different beast than traditional storage. The communication from SRM does not happen via a so-called storage replication adaptor (SRA) for vVols but happens through the VASA (vSphere APIs for Storage Awareness) provider via SMS (Storage Management Service). VASA is also what vSphere uses to communicate with the vVols storage system. Which is great if you ask me, as it reduces complexity!

So how does this work? Well, when you start SRM and VASA is discovered it will send a discovery request and will receive all data needed to create a protection group and recovery plan:

  • SRM gets all datastores from vCenter
  • SRM gets all fault domains and storage containers
  • SRM gets all replication groups and peers
  • SRM gets all related VMs
  • Gets all storage profiles of all replicated VMs

Of course, this will happen in both locations so a comprehensive recovery plan can be created. When a recovery needs to occur the steps are very similar to a traditional storage system. (See screenshot below.) What I think is important to know is that public API support for SRM and vVols will also be provided in the upcoming release, and in a subsequent PowerCLI release new PowerCLI cmdlets will also be delivered.

Next, a demo was shown of vVols and SRM. This was followed up by both HPe and Pure Storage explaining their vVols and SRM capabilities. That was it for HCI2894BU, let’s switch to HBI3109BU. There was just one announcement in this session, and I am going to limit it to that as I want to avoid duplication of content and explaining basic offerings like VMware Site Recovery (our DR as a Service solution). In short, the problem for some customers with our current DR as a Service offering is the fact that it requires you to have a VMware Cloud on AWS SDDC up and running. In some cases that SDDC may be idle and only used as a replication target. Which for Tier-1 workloads may be acceptable as they will require a low RPO and low RTO. But what about Tier 2 and Tier 3 workloads?

For Tier 2 and Tier 3 workloads having an SDDC up and running may simply too expensive or overkill. For that reason, VMware is investing in a Cost Optimized DR as a Service solution. With the cost-optimized solution, VMs would be replicated to lower-cost cloud storage, without the need for an active SDDC running. Note that this is not a feature that would normally replace the current DRaaS solution, it should complement the current solution. Tier 1 and business-critical apps should use the “Performance Optimized” option and Tier 2 and Tier 3 workloads can potentially use the Cost Optimized option in the near future. No dates around when (and if) this will be released were provided.

And that was it for VMware disaster recovery and business continuity reveals during VMworld 2019. Hope you find this series useful so far!

The post VMworld Reveals: Disaster Recovery / Business Continuity enhancements! (#HCI2894BU and #HBI3109BU) appeared first on Yellow Bricks.

Posted in BC-DR, business continuity, Disaster Recovery, replication, Server, site recovery manager, srm, vmworld, vmworld reveals, vSphere, vsphere replication | Comments Off on VMworld Reveals: Disaster Recovery / Business Continuity enhancements! (#HCI2894BU and #HBI3109BU)

New KB articles published for the week ending 31st August,2019

Customer Purchasing Program Customer Purchasing Program – Program Definitions and Policies Published Date :8/26/2019 Customer Purchasing Program FAQ – Online Tools and Migration Published Date :8/26/2019 Customer Purchasing Program FAQ – Points and Discounts Published Date :8/26/2019 Customer Purchasing Program FAQ – Customer Specific Questions Published Date :8/26/2019 Customer Purchasing Program FAQ ? General Information

The post New KB articles published for the week ending 31st August,2019 appeared first on VMware Support Insider.

Posted in KB Digest, Knowledge Base | Comments Off on New KB articles published for the week ending 31st August,2019

VMworld Reveals: vMotion innovations

Advertise here with BSA


At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about enhancements that will be introduced in the future to vMotion, the was session HBI1421BU. For those who want to see the session, you can find it here. This session was presented by Arunachalam Ramanathan and Sreekanth Setty. Please note that this is a summary of a session which is discussing a Technical Preview, this feature/product may never be released, and this preview does not represent a commitment of any kind, and this feature (or it’s functionality) is subject to change. Now let’s dive into it, what can you expect for vMotion in the future.

The session starts with a brief history of vMotion and how we are capable today to vMotion VMs with 128 vCPUs and 6 TB of memory. The expectation is though that vSphere in the future will support 768 vCPUs and 24 TB of memory. Crazy configuration if you ask me, that is a proper Monster VM.

You can imagine that this also introduces a lot of challenges for vMotion, as a VM like this will need to move without any kind of interruption, especially as typically apps like SAP HANA or Oracle run inside of these VMs. Of course, customers expect the behavior of this VM to be the same as a 1 vCPU / 1GB VM type of VM. But you can imagine this is very difficult as things like iterative copies can take a lot longer, same of course for memory pre-copy and the switch over. It is then explained how things like page tracing works today, but more importantly what the impact is on VMs when using this technique. For a VM with 72 vCPUs and 512GB of memory, enabling page tracing, for a total of 53 seconds the VM is not running, while the total vMotion time is 102 seconds, meaning that more than half of the time the VM is not actively running. And although the user does not typically notice it as the stops are very granular, there definitely is an impact for larger VMs. (The bigger the VM, the bigger the impact.) How can we reduce the cost?

Well what if we can enable traces without having to stop the vCPUs, this is when “loose page traces” are introduced. The solution is rather simple, only a single vCPU is stopped and that single vCPU takes care of enabling the trace (install as they call it) and then the flush will happen on all vCPUs individually. Since now only a single vCPU has to be stopped, there’s a huge performance benefit. Another optimization that is introduced as Large Trace Installs, this refers to the size of the memory page being traced. Instead of tracing at a 4KB level, 1GB pages are now traced. This reduces the number of pages that need to be set to read-only and again improves the performance of the guest and the vMotion process. The 53 seconds in the previous example is now reduced to 3 seconds, and also the vMotion time is now reduced from 102 seconds down to 67 seconds. This is HUGE. On top of that, the performance hit is as a result decreased from 46% to only 8%. I can’t wait for this to ship!

The second major change is around what we call Trace Fires, what happens when a guest tries to write to a memory page which is traced? What is the cost currently and how can this be optimized? Today when a page fault occurs for dirty page tracking, again, the vCPU temporarily stops executing guest instructions. It takes the actions required to inform vMotion that the page is dirtied, which means vMotion now knows it needs to resend the page to the destination. All of this costs a few thousands of CPU cycles. Especially the fact that the vCPU temporarily is able to execute guest instructions hurts performance. Especially with a larger VM, this is painful. This whole process, primarily the cost of locking, has been optimized. This also resulted in a huge performance benefit, for a 72 vCPU VM with 512GB it results in a 35% improvement of vMotion time, 90% reduction of vCPU tracing time and a guess performance improvement of 70% compared to the “old” mechanism. Again, huge improvements.

In the second half of this section all the different performance tests are shared, what is clear is that the improvements mainly apply to Monster VMs, except for the page trace (install) changes, they make a big difference for VMs of all sizes. As shown in the screenshot below, not only does vMotion benefit from it, but also the guest benefits from it.

Next what was being discussed was the Switch-Over process and what has been optimized in this space to improve vMotion. Typically the switch-over should happen within a second. The challenge with the switch-over typically is the Transfer Memory Changed Bitmap and the Transfer Swap Bitmap that needs to be sent. These Bitmaps are used to track the changes during the switch-over, which could be a significant number of pages. The larger (and more active) the VM, the larger the bitmap, a 1GB VM would have a 32KB bitmap where a 24TB VM would have a 768MB bitmap. Huge difference, and could be a problem as you can imagine. The optimization was simple, as the majority of the bitmap was sparse, they simply compacted the bitmap and transmit that. For a 24TB VM, the change was 2 seconds vs 175 milliseconds, which is huge. You can imagine that when you go to a VM with 100s of TBs of memory, this would even make a bigger difference.

Then last but not least Fast Suspend and Resume was discussed, this is one of the key features being used by Storage vMotion for instance (hot-add virtual devices also uses it). What is FSR? Well, in short, it is the mechanism which allows the SvMotion to occur on the same host. Basically it transfers the memory metadata to the new (shadow) VM so that the SvMotion can be completed, as we end up doing a compute migration on the same host. You would expect this process to not impact the workload or SvMotion process too much as it is happening on the same host, unfortunately, it does as the VMs vCPU 0 is used to transfer the metadata, and of course, depending on the size of the VM, the impact can be significant, especially as there is no parallelization. This is what will be changed in the future. All the vCPUs will help with copying the metadata, greatly reducing the switch-over time during an SvMotion, for a 1TB VM with 48 vCPUs, the switch-over time went from 7.7 seconds to 0.5 seconds. Of course, various performance experiments were discussed next and demonstrated.

I was surprised to see how much was shared during this session, it goes deep fast. Very good session, which I would highly recommend watching, you can find it here.

The post VMworld Reveals: vMotion innovations appeared first on Yellow Bricks.

Posted in innovation, Server, Various, vmotion, VMware, vmworld, vmworld reveals, vSphere | Comments Off on VMworld Reveals: vMotion innovations

New KB articles published for the week ending 24th August,2019

VMware ESXi “Module CheckpointLate power on failed” error when provisioning a Horizon Instant Clone pool Date Published: 22-Aug-19 Change to default boot options when creating a Windows 10 and Windows 2016 server and later in vSphere 6.7 Date Published: 21-Aug-19 ESXi host experiences PSOD with references to the FCOR module (qfle3f) in the backtrace. Date

The post New KB articles published for the week ending 24th August,2019 appeared first on VMware Support Insider.

Posted in KB Digest, Knowledge Base | Comments Off on New KB articles published for the week ending 24th August,2019

VMworld Reveals: DRS 2.0 (#HBI2880BY)

Advertise here with BSA


At VMworld, various cool new technologies were previewed. In this series of articles, I will write about some of those previewed technologies. Unfortunately, I can’t cover them all as there are simply too many. This article is about DRS 2.0, which was session HBI2880BY. For those who want to see the session, you can find it here. This session was presented by Adarsh Jagadeeshwaran and Sai Inabattini. Please note that this is a summary of a session which is discussing a Technical Preview, this feature/product may never be released, and this preview does not represent a commitment of any kind, and this feature (or it’s functionality) is subject to change. Now let’s dive into it, what is DRS 2.0 all about?

The session started with an intro, DRS was first introduced in 2006. Since then datacenters, and workloads (cloud-native architectures), have changed a lot. DRS, however, has remained largely the same over the past 10 years. What we need is a resource management engine which is more workload-centric than it is cluster-centric, that is why we are planning on introducing DRS 2.0

What has changed? In general, the changes can be placed in 3 categories:

  • New cost-benefit model
  • Support for new resources and devices
  • Faster and scalable

Let’s start with the cost-benefit model. First of all, it is essential to understand the algorithm and the DRS logic has changed. if you look at the slides below, hopefully, it makes it clear that DRS 2.0 is VM centric. Where 1.0 looks at the cluster state first, 2.0 focusses on the VM immediately. This makes a big difference because it can now potentially improve VM resource happiness sooner.

So now that we mentioned VM Happiness, what is this exactly? When computing the VM Happiness Score, DRS looks at between 10-15 metrics, the core metrics, however, are Host CPU Cache Cost, VM CPU Ready Time, VM Memory Swapped and Workload Burstiness. The score will be shown going forward in the UI as well on a per VM basis. If a migration is initiated, the key is to improve the VM Happiness score for that workload. Of course, you can also still see the cluster aggregated score. The VM Happiness score allows us to enables us to have a better balance while keeping the workloads happy.

Another improvement that is made is the frequency that DRS runs. DRS 2.0 will run every minute, instead of every 5 minutes. This means that if you have very dynamic workloads, DRS can simply respond faster. This has been made possible by remove the “cluster snapshot” mechanism that DRS 1.0 used, this was basically the limiting factor for DRS. Removing the cluster snapshot mechanism also means DRS 2.0 will use less memory and can work with a higher number of objects.

What I also found very interesting is that with DRS 2.0 you also now have the ability to change the VM demand interval, meaning that you have the ability to specify that the interval of default 15 minutes needs to be 40 minutes for instance. The benefit of increasing it from  15 to 40 would be the fact that spikes are averaged out. You can also decrease it down to 5 minutes if you want DRS to respond to spikey workloads. Pretty smart.

Another feature which is introduced in DRS 2.0 is the ability to do proper Network Load Balancing. Yes we had an option in the past to load balance on network load, but it was never a first-class citizen. It was a secondary metric which was only looked at after CPU and memory were considered. So if there was no need for balancing from a CPU or memory point of view, then the network utilization would not even be looked at. With 2.0 network is a primary metric, which means that VMs can be (and will be) migrated to balance networking utilization as shown in the screenshot below.

The cost-benefit model is also enhanced to include Network and PMEM. DRS 2.0 is also hardware aware, with that meaning that if you use vGPUs DRS 2.0 will take that into consideration for placement or balancing. On top of that, the cost-benefit model will also take workload behavior in to account (stable/unstable resource consumption), simply to avoid ping-pong’ing of VMs. Another useful feature that is added is that the cost-benefit model will take the benefit of a migration in to account when it comes to sustainability, how long will the benefit sustain is what it will look for. Also, something which Frank has discussed during various sessions when it comes to memory, DRS 2.0 no longer takes active memory into account as the world has changed. Most customers do not over-commit on memory as they used to in the past. this is why DRS 2.0 takes “granted memory” into account. This will result in significantly less vMotion’s.

The last thing that was discussed by Sai was the new Migration Threshold option in DRS 2.0. Again this is workload focussed and not cluster centric.

  • Level 1 – No load balancing, only mandatory moves when rules are violated
  • Level 2 – Highly stable workloads
  • Level 3 – Stable workloads, focus on improving VM happiness (Default level)
  • Level 4 – Bursty workloads
  • Level 5 – Dynamic workloads

Adarsh came up next and he started with showing the performance benefits of DRS 2.0 using TPCx-HCI. What was clear is that with DRS 2.0 the performance improved compared to 1.0. This resulted in a 5-10% performance increase for VMs on average, which I feel is huge. Next Adarsh discussed troubleshooting, he discussed how using VM Level Automation could result in different behavior than before. In the past, DRS would consider all VMs when load balancing, in 2.0 DRS will only consider VM with the same automation level (or higher). DRS 2.0 may also cause an increase in migrations, but this can be controlled by changing the migration threshold for DRS. Also, DRS will also only do 1 migration between a specific source-destination pair.

When will it be available? Well, it already is available in VMware Cloud on AWS, it has been enabled for over a year without any issues. It will become available in a future release, dates and version numbers, of course, were not discussed. If you would like to move back to the old behavior, there will be an advanced setting (FastLoadBalance=0) to switch back. If you are interested in learning more, make sure to watch the recording, there’s a lot of good stuff in there.

The post VMworld Reveals: DRS 2.0 (#HBI2880BY) appeared first on Yellow Bricks.

Posted in 2.0, drs, innovate, innovation, Server, tech preview, vmworld, vmworld reveals | Comments Off on VMworld Reveals: DRS 2.0 (#HBI2880BY)