VMware vCloud Director 9.5 – Cross-VDC Networking Blog Series – Design Considerations and Conclusion

Design Considerations

Let’s discuss some of the design considerations for Cross-VDC Networking inside of vCD. It is important to note that although Native NSX supports up to 16 sites (or 16 vCenters), vCD 9.5 as of today supports up to four (4) sites only.

Below are applicable considerations pulled from the NSX Cross VC Design Guide.

When deploying a Cross-VC NSX solution across sites, the requirements for interconnectivity between two sites are:

  1. IP Connectivity (Layer 3 is acceptable)
  2. 1600+ MTU for the VXLAN overlay
  3. < 150 ms RTT latency

In addition, it’s important to note, since logical networking is spanning multiple vCenter domains, there must be a common Administrative Domain for both vCenter domains/sites.

The physical network can be any L2/L3 fabric supporting a 1600-byte MTU or greater. The physical network becomes the underlay transport for logical networking and forwards packets across VTEP endpoints. The physical environment is unaware of the logical networks or VXLAN encapsulation as shown in Figure below. Encapsulation/de-encapsulation of the VXLAN header is done by the VTEPs on respective ESXi hosts, but the physical network must support the 1600 MTU to be able to transport the VXLAN encapsulated frames.

Typically, L2/L3 over dedicated fiber or a shared medium like MPLS service from an ISP is used for connectivity between sites with L3 connectivity being preferred for scalability and to avoid common layer 2 issues such as propagation of broadcast traffic over the DCI (data center interconnect) link or STP (spanning tree protocol) convergence issues.

Once the NSX Manager at Site-A is deployed via standard NSX Manager installation procedures (NSX Manager is deployed as an OVF file), it can be promoted to primary role.

Once the primary NSX Manager, is configured, the Universal Control Cluster (UCC) can be deployed from the Primary NSX Manager. Similar to standard design guide recommendations for resiliency, the NSX controllers should be deployed on separate physical hosts; anti-affinity rules can be leveraged to assure multiple NSX controllers don’t end up on the same physical host. If NSX controllers are deployed on the same host, resiliency is lost because a physical host failure can bring down more than one controller or possibly even the entire controller cluster if all controllers are on the same host.

The controllers distribute the forwarding paths to the vSphere hosts and have complete separation from the data plane. If one controller is lost, UCC will keep functioning normally. If two controllers are lost, the one remaining controller will go into read-only mode and new control plane information will not be learned but data will keep forwarding.

If the entire controller cluster is lost, again, the data plane will keep functioning. Forwarding path information on the vSphere hosts do not expire, however, no new information can be learned until at least two controllers are recovered.

We can work around this by enabling the Controller Disconnected Operation (CDO) mode. Controller Disconnected Operation (CDO) mode ensures that the data plane connectivity is unaffected in a multi-site environment, when the primary site loses connectivity. You can enable the CDO mode on the secondary site to avoid temporary connectivity issues related to the data plane, when the primary site is down or not reachable. You can also enable the CDO mode on the primary site for the control plane failure.

CDO mode avoids the connectivity issues during the following failure scenarios:

  1. The complete primary site of a cross-vCenter NSX environment is down
  2. WAN is down
  3. Control plane failure
  4. The CDO mode is disabled by default

When the CDO mode is enabled and host detects a control plane failure, the host waits for the configured time period and then enters the CDO mode. You can configure the time period for which you want the host to wait before entering the CDO mode. By default, the wait time is five minutes.

NSX Manager creates a special CDO logical switch (4999) on the controller. The VXLAN Network Identifier (VNI) of the special CDO logical switch is unique from all other logical switches.

When the CDO mode is enabled, one controller in the cluster is responsible for collecting all the VTEP information reported from all transport nodes and replicating the updated VTEP information to all other transport nodes. After detecting the CDO mode, broadcast packets like ARP/GARP and RARP is sent to the global VTEP list. This allows to vMotion the VMs across the vCenter Servers without any data plane connectivity issues.

Universal Control VM Deployment and Placement

The Universal Control VM is the control plane for the UDLR. Similar to the DLR Control VM in non-Cross-VC NSX deployments the Universal Control VM will be deployed on the Edge cluster and will peer with the NSX Edge appliances. Since Universal Control VMs are local to the vCenter inventory, NSX Control VM HA does not occur across vCenter domains. If deployed in HA mode, the active and standby Control VM must be deployed within the same vCenter domain. There is no failover or vMotion of Universal Control VMs to another vCenter domain. The Control VMs are local to the respective vCenter domain.

A deployment that does not have Local Egress enabled will have only one Universal Control VM for a UDLR. If there are multiple NSX Manager domain/sites, the Control VM will sit only at one site, which will be the primary site and peer with all ESGs across all sites.

In Active/Standby vCD Deployment (Tenant Layer in our case) and upon Active site failure, the Provider will need to manually redeploy the Tenant UDLR Control VM on the Standby (now Active) site. Promoting the Secondary site to active is a pre-requisite that the Provider will have to do upon total Primary site Failure.

A multi-site multi-vCenter deployment that has Local Egress enabled (In our case the Provider Layer) will have multiple Universal Control VMs for a UDLR – one for each respective NSX Manager domain/site; this enables site-specific North/South egress. If there are multiple NSX Manager domain/sites, there will be a Control VM at each site; each control VM will also connect to a different transit logical network peering with the ESGs local only to its site. Upon site failure, no Control VM needs to be manually redeployed at a new primary site, because each site already has a Control VM deployed.

Stateful Services

In an Active/Passive North-South deployment model across two sites, it’s possible to deploy ESG in HA mode within one site where ESG is running stateful services like firewall and load balancer. However, HA is not deployed across sites.

The stateful services need to be manually replicated at each site, this is an important consideration. This can be automated via custom scripts leveraging NSX REST API. The network services are local to each site within both the Active/Passive North/South egress model and the Active/Active North/South egress model.

Graceful Restart

One item to note is Graceful Restart is enabled on ESGs by default during deployment. In a multi-site environment when using ESGs in ECMP mode, this typically should be disabled.

If it’s left at the default and aggressive timers are set for BGP, the ESG will have traffic loss on failover in an ECMP environment due to preserved forwarding state by graceful restart. In this case, even if BGP timers are set to 1:3 seconds for keepalive/hold timers, the failover can take longer. The only scenario where Graceful Restart may be desired on ESG in an ECMP environment is when ESG needs to act as GR Helper for a physical top of rack (ToR) switches that is Graceful Restart capable. Graceful Restart is more utilized in chassis architecture which have dual route processor modules and less so on ToR switches.

Final Cross-VDC Considerations

While Cross-VDC networking presents many new networking capabilities, there are a few things we’ve learned that are not covered as of today. These are important factors to consider when deploying Cross-VDC for your tenants.

  1. Universal Distributed Firewall (UDFW) is not a concept available via vCloud Director 9.5. Any DFW rules will need to be created on a per orgVDC site and managed independently.
  2. Network services within respective OrgVDC Edges will need to be managed independently. Therefore, a NAT rule that is on Site-A does not propagate to Site-B – this is an important factor to consider during failover scenarios.
  3. Proper thought needs to be put into ingress of traffic between multiple sites. Consider using a Global Load Balancer (GLB) technology to manage availability between sites.
  4. As expected, Cross-VDC networking only works with NSX-V. NSX-T has a different interpretation of multi-site capability and this is something we are investigating for future vCD releases.

Conclusion

While this is blog series covered many aspects of Cross-VDC networking within vCloud Director, this is just scratching the surface of design considerations, use case discussion, and feature sets available inside of vCloud Director.

If you have interest in learning more or discussing a potential design, please reach out to your VMware Cloud Provider field team. Thanks again for reviewing our material!

Daniel, Abhinav, and Wissam

 

 

 

 

 

The post VMware vCloud Director 9.5 – Cross-VDC Networking Blog Series – Design Considerations and Conclusion appeared first on VMware Cloud Provider Blog.

Posted in NSX, vCloud Director, VMware Cloud Provider | Comments Off on VMware vCloud Director 9.5 – Cross-VDC Networking Blog Series – Design Considerations and Conclusion

New KB articles published for the week ending 1st December,2018

VMware ESXi Potential data loss due to resynchronization mixed with object expansion Published Date: 28-11-2018 VMware Horizon [HybridLogon] Setting invalid paths for home directory as part of hybrid logon feature in rds agent registry may result in home directory variable set unexpectedly Published Date: 26-11-2018 [Hybrid Logon] Access denied when trying to add a shared

The post New KB articles published for the week ending 1st December,2018 appeared first on VMware Support Insider.

Posted in KB Digest, Knowledge Base | Comments Off on New KB articles published for the week ending 1st December,2018

Usage Meter requirement for SMTP

As we continue to enhance the features and capabilities of Usage Meter, one of the changes that has people asking questions is why we are requiring the configuration of SMTP when installing Usage Meter 3.6.x. This was a change from 3.5.x where SMTP was not a requirement. 

Usage Meter Monitoring

One of the key reasons why this requirement is enabled is for alerting. Since Usage Meter facilitates resource utilization and consumption reporting, it is important to know when issues may be occurring as these issues may be directly related Usage Meter’s ability to collect or report data. For this use case, the SMTP server can be an internal server and does not require internet access. Optionally, It can also leverage an account that does not have the ability to forward emails outside of the local domain. 

Automated reporting

For the second use case of automatic reporting via vCloud Usage Insight, internet access is required. Integration with Usage Insight is important because it increases operational efficiency by automating the monthly reporting workflow. By not leveraging an internet facing SMTP server or allowing outbound-only HTTPS connectivity, this operational efficiency is not achieved.  This is painfully apparent with Cloud Providers who have multiple Usage Meters today and have to manage each one individually. 

Data Management

For Cloud Providers who are concerned about reports with customer related data, fret not. Personally Identifiable Information (PII) is not present in the email. Information such as customer names, hostnames, virtual machines names, etc are also anonymized using a one-way hash. The use of this one-way hash prevents the values from being decrypted. As an additional security layer, the emails are PGP encrypted. For additional information, please review the Metering Guidelines.

Conclusion

Ultimately, the need and value offered by the SMTP configuration far outweighs the risks. As we think about the direction, and goals, of Usage Meter and Usage Insight, being able to do more with less effort is one of the key drivers. Having an accessible SMTP server plays a critical role in the ability to deliver these features. 

For additional information please check out the Usage Meter product page.

The post Usage Meter requirement for SMTP appeared first on VMware Cloud Provider Blog.

Posted in VCPP, VMware Cloud Provider | Comments Off on Usage Meter requirement for SMTP

Recorded webinar now available: A Farewell to LUNs – Discover how VVols forever changes storage in vSphere

Pete Flecha from VMware and I just finished recording a webinar on VVols where we discuss challenges with external storage, the benefits of VVols along with the latest adoption trends and ecosystem readiness. If you are on the fence about VVols or just want to learn more about it be sure and check it out. … Continue reading »

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]
Posted in News, VVols | Comments Off on Recorded webinar now available: A Farewell to LUNs – Discover how VVols forever changes storage in vSphere

New KB articles published for the week ending 24th November,2018

VMware vSphere ESXi hostd is not responding when running test recovery or planned migration by SRM and vSphere Replication for over 10 virtual machines Date Published: 11/20/2018 Renewing a Host Certificate doesn’t push full certificate chain to the host Date Published: 11/20/2018 Windows Server 2019 crashes or restarts automatically when hot adding vCPU Date Published: 11/21/2018 VMware

The post New KB articles published for the week ending 24th November,2018 appeared first on VMware Support Insider.

Posted in KB Digest, Knowledge Base | Comments Off on New KB articles published for the week ending 24th November,2018

SSH on OSX Mojave failing with broken pipe error

Advertise here with BSA


I recently upgraded my Macbook to OSX Mojave (10.14.1). Ever since I upgraded whenever I want to open an SSH session to any server on the internal (VMware) network I would receive the following error message:

packet_write_wait: connection to x.y.z. port 22: broken pipe

Very annoying as it made deploying labs in our dev cloud very complicated. I googled around and there were many suggestions on how to solve this, but none worked so far. A colleague today pointed me to thread on VMTN (surprisingly) which describes how to solve the problem. it is very simple, just add “ssh -o IPQoS=throughput” to your normal ssh command. So something like the following:

ssh ssh -o IPQoS=throughput root@192.168.1.1

Thanks Alex for the pointer, and thanks Quinn for posting the solution on VMTN!

The post SSH on OSX Mojave failing with broken pipe error appeared first on Yellow Bricks.

Posted in broken pipe, osx, Server, ssh | Comments Off on SSH on OSX Mojave failing with broken pipe error

VMware VVols Today: Part 1 – What VMware has delivered so far

This is a multi-part series covering various aspects of VMware Virtual Volumes (VVols) from support to adoption to benefits to predictions and more. In part 1 we’ll take a look at what VMware has delivered so far with VVols and the VASA specification. In the first part of this series on VVols let’s do a … Continue reading »

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]
Posted in News | Comments Off on VMware VVols Today: Part 1 – What VMware has delivered so far

Black Friday Gift: Free copy of the vSphere 6.7 Clustering Deep Dive, thanks Rubrik (ebook)

Advertise here with BSA


Many asked us if the ebook would be made available for free again. Today I have the pleasure of announcing that Frank, Niels and I have worked once again with Rubrik and the VMUG organization to make the vSphere 6.7 Clustering Deep Dive book available for free! Yes, that is 0 USD / EURO, or whatever your currency is. As the book signing at VMworld was wildly popular, which resulted in the follow up discussion about the ebook.

You want a copy? All that we expect you to do is register on Rubrik’s website using your own email address. Personally, I also hope that those of you who are considering a new backup/recovery/data management platform will consider Rubrik. Get in touch with them, do a PoC, test it out. Anyway, enough of that, register and start your download engines, pick up a fresh copy of the vSphere Clustering Deep Dive here!

The post Black Friday Gift: Free copy of the vSphere 6.7 Clustering Deep Dive, thanks Rubrik (ebook) appeared first on Yellow Bricks.

Posted in 6.7, clustering, clustering deep dive, deep dive, drs, ebook, ha, rubrik, Server, Various, vSphere | Comments Off on Black Friday Gift: Free copy of the vSphere 6.7 Clustering Deep Dive, thanks Rubrik (ebook)

vSAN 6.7 U1 Deep Dive book coming soon!

Advertise here with BSA


Cormac and I decided to update the vSAN Essentials book. We added a whole bunch of extra info and also decided to rebrand it. “Essentials” did not really cut it, it is much more than that. Considering I just finished the Clustering Deep Dive with Frank and Niels, we figured this could be a nice addition to that series, complementing both the Host Deep Dive as well as the Clustering Deep Dive. We’ve received all the feedback from our reviewers, Frank Denneman and Pete Koehler, and spend various evenings digesting and processing it. Now it is just a matter of adding the foreword to the book, and then we can simply press: Publish. Hopefully, within 2 weeks, I will have a new article that details how you can buy the book!

The plan is right now to release the paper copy and the ebook at the same time, we will link the books, so those who buy the paper copy can buy the ebook at a discounted price. We will also make sure the ebook is priced very attractive, as we feel it should be the format of choice for everyone!

The post vSAN 6.7 U1 Deep Dive book coming soon! appeared first on Yellow Bricks.

Posted in 6.7 u1, Book, deep dive, Server, Software Defined, Storage, Various, Virtual SAN, vsan | Comments Off on vSAN 6.7 U1 Deep Dive book coming soon!

Onboarding to VMware Cloud Provider Hub and using self-service automated service activation for tenants

VMware Cloud Provider HubTM is the platform for VMware Cloud Provider partners (VCPP) to get access to VMware XaaS services. Currently the services we support are VMware Cloud on AWS allowing partners to go asset-light and VMware Log Intelligence , a simple and powerful log collection and analytics tool helping manage cloud operations. These services are made available using the VCPP Managed Service Provider (MSP) program commitment based constructs.

Here, we outline the steps cloud providers need to take to on-board and provision the services for their tenants.

The onboarding process consists of primarily two steps

  1. Master org creation
  2. Service activation for tenant , which is an easier automated way to onboard tenants and manage services and users

The steps to create the master org invitation are the same as in MSP 1.x and are as follows:

  1. Commit contract for the service (s) needs to have been created for the service provider email
  2. When the contract becomes active, an email is received for creating master org
  3. Providing a name for master organization, accepting ToS , confirming the payment method and providing master organization metadata creates a master organization for the provider
  4. *New* When another commit contract is created for the same service provider with the same email id, there will be no master org invitation sent. When the commit contract becomes active, it will be reflected in the payment method and that service tile will be enabled under ‘Services Available for provisioning’ when the provides logs in

Creating master org

Pre-requisites

  1. VCPP partner email is a registered email with MyVMware account. Partner email id used while creating commit contract MUST be a registered valid MyVMware account, with a complete profile and password. Make sure to verify by logging into MyVMware before providing this email for commit contract creation.
  2. The commit contract for the service created and has becomes active

Below are the steps for creating a master org

  • When the commit contract becomes active, an email is automatically sent to the email id provided while signing the contract. Using this email, a master organization needs to be created first. It can take upto 30 minutes sometimes to be received after the commit contract becomes active.
    • This link can be used only once and will expire in 30 days.
    • It needs to be done once for a partner for the first commit contracts , when it becomes active.
    • Once partner logs in VMware Cloud services using this link , he will able to create a new master Org which is the provider org . The activated commit contract is the default payment method for that org as well as any tenant org created under this provider org .
  • Click on the onboarding email to login to VMware Cloud services console
../Screenshots/CSP_MaterOrg/csp_MasterOrg_1_2.png

Note: VMware Cloud Provider Hub is available only in English at this time

  • Enter a name for the master Org – example ‘Acme’, accept T&Cs

 

  • Confirm the commit contract to be associated with this organization. The commit contract associated with the master organization cannot be changed at a later time. If there was a fund account associated with this user, it will be displayed for informational purposes only.

NOTE: If there were two commit contracts signed before you created the master org – VMC MSP commit contract and CMS MSP commit contract for two services for the same provider and email associated, both contracts will appear in the screen above.

  • Provide the metadata for the master organization
  • Country and zip code are required fields
  • Tag is an optional field for an ID, which can be primarily used to filter / query while using the apis
    • An example would be eng – a department that will be consuming the service
  • Once the master org is created, the service provider lands in the home screen of Cloud Provider Hub console with the services for which the commit contract was created under Services Available for Provisioning
  • This user by default is assigned the role of Provider Administrator

 

At the end of this step provider has

  1. Logged into VMware Cloud Services
  2. Provided name for master Org – Acme
  3. Accepted ToS
  4. Confirmed the commit contract(s) associated with that master org
  5. Provided master org metadata
  6. Lands on Cloud Provider Hub console with the master org created
  7. Service (s) for which the commit contract is signed is available for provisioning to the provider
  8. Service provider who created the master organization has been assigned the provider administrator role

 

Other provider users with different provider roles with different permissions – Provider Administrator, Provider Operations Administrator, Provider Accounts Administrator, Provider Billing User and Provider Support User. Learn more about the different roles and permissions here

Service Activation for a Tenant

Service activation in Cloud Provider Hub has been simplified from steps to steps

Pre-requisites

  1. VCPP partner email is a registered email with MyVMware account. Service provider email id used while creating commit contract MUST be a registered valid MyVMware account, with a complete profile and password. Make sure to verify by logging into MyVMware before providing this email for commit contract creation.
  2. Master organization is created and onboarded
  3. Admin contact , if provided, while creating tenant must be a registered MyVMware account

Below are the steps for activating a service

  1. At this time, there are no tenants for this service provider

So provider need to add a tenant using Tenant Management , which results in a tenant org being created with all the metadata provided.

Provider can even provide the admin contact at a later time. If an admin contact is provided, the email provided becomes a tenant administrator . Provider managing the service access is by default has implied role as Tenant Administrator.

  1. Once tenant is added, select the tenant and choose Manage Services
  1. Service provider is switched to a tenant org and selecting Open will do the magic of service activation for the tenant user.
  1. For each service added providers can choose the access level for the service at tenant org level with Manage Tenant Access.

Providing an example with VMware Cloud on AWS service

      • no access
      • fully managed access
      • partially managed access
  1. No access for tenant – It is Fully Managed by Provider
  • When No Access for the Tenant option is provided, all users with tenant level access to the tenant organization will not have any access to the service VMware Cloud on AWS under Services Available
  • Provider can choose to share the vCenter Server URL, username and password of the SDDC with the tenant via email or just the vms as required
  1. Tenant granted vSphere Access – Fully managed service by the provider

When Grant vSphere Access option is chosen, all users with tenant level access to the tenant organization can view the service tile for VMware Cloud on AWS under Services Available.

  • When a tenant user logs in tenant organization, clicks on Open in the service tile, all the SDDCs deployed in this tenant are listed below the service tile as shown below. Tenant can only connect to vSphere from here.
  • In this model too, provider needs to deploy SDDC and do the configuration for the tenant
  • Provider needs to share the vCenter Server URL, username and password of each SDDC with the tenant via email.
  • Provider can get the details of vCenter Server from Settings tab of SDDC as shown below
  1. Grant service access with service roles

Using the credentials sent by the provider, tenant can login to vCenter Server and deploy vms in it.

  1. (Optional) Step 3 and Step 4 needs to be repeated for each additional service needed to be added for the tenant.

Need to repeat Step 1 through 5 for each tenant you need to onboard and activate services. The whole process of service activation takes less than 10 minutes.

Different tenant users with roles – Tenant Administrator , Tenant Billing User and Tenant user can be added .

Summarizing the onboarding workflow

Use contextual help and online documentation and explore further the features of VMware Cloud Provider Hub for your customer end-to-end lifecycle management.

The post Onboarding to VMware Cloud Provider Hub and using self-service automated service activation for tenants appeared first on VMware Cloud Provider Blog.

Posted in Cloud Services, VMware Cloud Provider | Comments Off on Onboarding to VMware Cloud Provider Hub and using self-service automated service activation for tenants