kubernetes | Nacho Gonzalez

VMware Explore Las Vegas – Top 5 Sessions to look for.

June 27, 2023 by Nacho

VMware Explore Las Vegas is around the corner, it will take place on August 21^st to 24^th in the Venetian convention center.
I think it’s an excellent opportunity for all the IT professionals that have the opportunity to assist and see what VMware has prepared for us.

Here are top 5 sessions I’m looking forward to:

Accelerate Digital, Customer-Centric Banking – A Financial Service Workshop [INDT2712LV]

As a financial services IT leader, you strive to deliver exceptional customer experiences while investing in modernization strategies that fuel growth, innovation and efficiencies. In this interactive financial services workshop, learn how IT leaders like you deliver customer-centric digital banking experiences. Industry leaders will discuss the trends, best practices, and lessons learned as they accelerate digital transformation. In panel discussions and interviews, hear from financial services leaders. From infrastructure modernization to building modern apps, learn practical ways to accelerate the delivery of new products, services and capabilities that improve customer experience and boost competitive advantage. You will also have the chance to participate in a Q&A with the customer panelists and VMware industry experts.

Speakers:
Steven Fusco, Global Business Solutions Strategist, VMware
Joe Chenevey, Principal Solutions Architect, VMware
Paul Nothard, Global Financial Services Industry Group, VMware

Track: Cloud & Edge Infrastructure

Session Type: Tutorial

Level: Business 100

Primary Product: VMware Cloud

Products: VMware NSX, VMware Tanzu, VMware Workspace ONE, VMware Cloud, VMware Aria Automation

Primary Session Audience: VP, Cloud

Session Audience: CIO, VP, Apps & Platforms, VP, End User Computing, VP, Infrastructure & Operations, VP, Cloud

Why am I interested in this session?
For the past years I’ve been heavily involved in the financial sector and I think it’s a business that is in constant change.
When my parents needed to make a deposit, they needed to go to the bank with the cash and make the deposit, the same thing if they applied for a loan or a credit card.
I manage all my banking through the comfort of my phone and I don’t even know which is my bank’s office.
I think it’s important to understand what are the trends in the fintech industry, how VMware is helping, and how we as IT professionals can prepare ourselves for the upcoming days.

60 Minutes of NUMA – A CPU is not a CPU anymore [CODEB2761LV]

Recently both AMD and Intel made their flagship Server CPUs available, the AMD EPYC and the 4th generation of the Intel Xeon Scalable Processor. Both follow a similar paradigm of the multi-chip, multi-tile module design, impacting kernel scheduling and workload sizing to work around the application expectation of a CPU subsystem. Besides the traditional deep dive of NUMA scheduling, we will discuss the new vSphere 8 functionality solving the age-old question, how many cores per socket? We will show real-life data on NUMA-locality’s importance for a Machine Learning\HPC application that runs on GPUs. We will look at the latest CPU architecture developments, such as the Sapphire Rapids’ new onboard accelerators, which can help you guide your data center hardware refresh. Join me as we dive into this exciting and critically important space.

Speaker: Frank Denneman, Chief Technologist, VMware

Track: Cloud & Edge Infrastructure

Session Type: Breakout Session

Level: Technical 300

Primary Product: VMware vSphere

Products: VMware Tanzu, VMware vSphere, VMware vSphere with VMware Tanzu

Primary Session Audience: Infrastructure Architect

Session Audience: Cloud Admin, Enterprise Architect, Infrastructure Architect, SysAdmin, Platform Ops

Why am I interested in this session?
I’ve been reading Frank’s vSphere Resource Deep Dives for the past years and I think it’s a very good way to understand how vSphere (or vSAN) works under the hood (not only what happens when you click X or Y). In VMworld 2020 I attended this session and I feel it upscaled my understanding of ESXi and vCenter to new levels, being able to provide performance fine-tuning recommendations to my customers in the best way possible.

Am I Being Phished? – Protect Your Mobile Users with Workspace ONE Mobile Threat Defense [EUSB2727LV]

The use of mobile devices in enterprises increases and is a key satisfaction metric from employees who work from anywhere. But the number of threats impacting mobile users increases in number and complexity. How can you protect your mobile users from new threats, such as phishing, that arise in all sorts of applications while still protecting their privacy? Learn how to configure and deploy VMware Workspace ONE Mobile Threat Defense to protect your organization as an integration into your Workspace ONE environment.

Speaker: Andreano Lanusse, Staff EUC Architect, VMware

Track: Hybrid Workforce

Session Type: Breakout Session

Level: Technical 200

Primary Product: VMware Workspace ONE Mobile Threat Defense

Products: VMware Workspace ONE Intelligence, VMware Workspace ONE UEM, VMware Workspace ONE Mobile Threat Defense

Primary Session Audience: VP, End User Computing

Session Audience: VP, End User Computing, Senior Security Manager, EUC Specialist

Why am I interested in this session?
Recently I’ve been working a lot with workspace one, and I think it’s a great product (not a magic product as many people think) and corporate mobile phones are often a neglected part of enterprise security.
I think this session will help me understand Workspace One MTD, a fairly recent product, and how can it help secure the mobile fleet of my clients.

A Decade of Platform Engineering with VMware Tanzu [MAPB2122LV]

Kubernetes has become the de facto standard for container orchestration. Platform engineering teams have options when using that technology, such as a prescriptive approach with platform as a service or a more flexible approach where they can customize their Kubernetes platform. VMware Tanzu supports multiple containerization journeys with many ways of running applications. Platform teams can choose between VMware Tanzu Application Service or VMware Tanzu for Kubernetes Operations. This panel, which includes Tanzu Vanguards, will share important considerations when running the platform, including security, survivability, scale, cloud native secrets, networking do’s and don’ts, ChatOps, and more. This panel will also share lessons learned and best practices for dealing with chatty neighbors and microservices, using isolation segments for specific use cases, and application and platform monitoring.

Speakers:
Nick Kuhn, Senior Technical Marketing Architect, VMware

Pawel Piotrowski, Architect, S&T Poland

Jonathan Regehr, Platform Architect, Garmin International

Venkat Jagana, Vice President&Distinguished Engineer, Kyndryl

Kerry Schaffer, Senior IT Director, OneMagnify

Track: Modern Applications & Cloud Management

Session Type: Breakout Session

Level: Technical 300

Primary Product: VMware Tanzu Application Service

Products: VMware Tanzu Application Platform, VMware Tanzu for Kubernetes Operations, VMware Tanzu Application Service

Primary Session Audience: Platform Ops

Session Audience: VP, Apps & Platforms, Application Developer, Cloud Architect, DevOps Manager, Platform Ops

Why am I interested in this session?
I’ve been working with Tanzu a lot recently, and although I think it’s a great product. I feel Kubernetes in general is complex to operate (Day 2 Operations), and it’s a shared feeling with some peers. I have the impression that this session, in collaboration with Tanzu Vanguard Experts will provide me with insights or ideas to apply in our environment.

Machine Learning Accelerator Deep Dive [CEIM1849LV]

In this Meet the Expert session, you can ask all your questions about deploying machine learning workload and the challenges around accelerators. When do I use CPU resources, what about NUMA relation, when do I use passthrough, and what benefit has NVIDIA AI Enterprise? What’s the difference between MIG and time-sliced, and how can I avoid cluster defragmentation in certain scenarios? Join me as we dive into this exciting and critically important space.

Speaker: Frank Denneman, Chief Technologist, VMware

Track: Cloud & Edge Infrastructure

Session Type: Meet the Expert Roundtable

Level: Technical 300

Primary Product: VMware vSphere

Products: VMware Tanzu, VMware vSphere, VMware Edge, VMware Cloud, VMware + NVIDIA AI-Ready Enterprise Platform

Primary Session Audience: Enterprise Architect

Session Audience: Enterprise Architect, Infrastructure Architect, SysAdmin, Platform Ops

Why am I interested in this session?
AI Is the new shiny thing, In a couple of months it generated more value-added than Web3 (Sorry Crypto Bro). My perspective is that in the upcoming years, we will start seeing more and more ML workloads in the enterprise data center so I would like to understand how to fine-tune the performance and understand the challenges and limitations of running ML workloads on vSphere clusters.

And why not, if Machines Enslave humanity, be a survivor just because I know how to feed them more servers?

Bonus Track: Lessons Learned from the Most Complex VMware Aria Automation 7 to 8 Migration to Date [CEIB1318LV]

In this session, learn and understand everything that happened under the hood in VMware’s most complex migration to date of VMware Aria Automation (formerly VMware vRealize Automation) version 7 to 8, involving an industry-leading transportation customer. Find out how we were able to solve their unique use case and, through careful thinking and thousands of lines of code, get them to the finish line.

Speakers:

Pontus Rydin, Principal Systems Engineer, VMware

Lucho Delorenzi, Staff Customer Success Architect, VMware

Track: Modern Applications & Cloud Management

Session Type: Breakout Session

Level: Technical 300

Primary Product: VMware Aria Automation

Products: VMware NSX, VMware Aria Automation, VMware Aria Automation Orchestrator

Primary Session Audience: Cloud Architect

Session Audience: Cloud Admin, Cloud Architect, Infrastructure Architect, Platform Ops, VP, Cloud

Why am I interested in this session?
Aria automation (sorry, but in my twisted mind I still call it, vRealize Automation) is one of the most powerful products for the SDDC and the multi-cloud environment, that being said, it’s also one of the most complex.
My friend and colleague Lucho is delivering this session and based on all the sessions he provided to VMUG Argentina I think we can all learn a thing or two from him.

vSphere with Tanzu – .local DNS issues.

February 20, 2023 by Nacho

Yes, I know, It’s been a while since I’ve last posted here, yeah, getting used to a new country is not an easy thing. Well, the idea is to keep up with the blog and try to pay more attention to it.

Now, let’s get to business, shall we?

Seth Eliot ☕ on Twitter: "What does one not simply deploy? https://t.co/qq8imG0MbX" / Twitter — Murphy’s Law states: “Anything that can go wrong will go wrong.”

I am a firm believer in properly planning, thinking, and designing, thinking every possible outcome, but when you are deploying new technology there is always a showstopper. Sometimes it can be easily solved and sometimes it takes a bit more time.

Lately, I’ve been working on vSphere with Tanzu and I came across an interesting issue with a Client.
My client which for confidentiality reasons I’ll name Contoso had an Active Directory domain that also served as a primary DNS zone contoso.local. as part of a deployment that has been going around for quite some time and a scenario that as consultants we can face often (A legacy environment that has hundreds or even thousands of servers coming from many years ago).

We designed a simple yet functional solution of vSphere with Tanzu leveraging vDS and using NSX ALB.
The initial deployment worked like a charm, we were able to set up our NSX ALB cluster, the service engine, set up the supervisor cluster, the TKG Cluster, and the Harbor repo all in less than a day.

The problem came when performing the first pull of an image. The worker nodes were not resolving internal DNS.

We logged in to the TKG Cluster nodes (for reference you can see here )
and we tested that the proper servers were configured.
additionally, the communication was working correctly.

Seems easy, doesn’t it? Hang in there.
Well, after some googling we came across this KB article that states:

Symptoms:

Dig and nslookup fails from the supervisor control plane nodes to hostnames ending in .local

Cause

This is not unique to vSphere with Tanzu. This is expected behavior from the systemd-resolved service.

https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html

Multi-label names with the domain suffix “.local” are resolved using MulticastDNS on all local interfaces where MulticastDNS is enabled. As with LLMNR, IPv4 address lookups are sent via IPv4 and IPv6 address lookups are sent via IPv6.
Queries for multi-label names are routed via unicast DNS on local interfaces that have a DNS server configured, plus the globally configured DNS servers if there are any. Which interfaces are used is determined by the routing logic based on search and route-only domains, described below. Note that by default, lookups for domains with the “.local” suffix are not routed to DNS servers unless the domain is specified explicitly as routing or search domain for the DNS server and interface. This means that on networks where the “.local” domain is defined in a site-specific DNS server, explicit search or routing domains need to be configured to make lookups work within this DNS domain. Note that these days, it’s generally recommended to avoid defining “.local” in a DNS server, as RFC6762 reserves this domain for exclusive MulticastDNS use.

The KB also lists a workaround intended only to use on POC environments and NOT on production.

Basically, the problem is that PhotonOS (the OS for TKG nodes) uses the systemd-resolver service.
Systemd-resolver it’s a widely used MulticastDNS resolver, and RFC6762 defines that any domain with the .local suffix should be used only for internal use. (In other words, it’s the equivalent of naming your DNS domain localhost).

On top of that, we found another problem, as per the KB article:
The .local hostname is reserved for use in mDNS per RFC6762 therefore trying to resolve it against a DNS server violates RFC6762. As such VMware does not recommend any deployment which uses .local for any components. (this includes vCenter, ESXi, nsx manager, nsx edge nodes, and any endpoint TKGS uses like harbor).

The .local DNS suffix might be reserved in other products which we want to deploy in this environment, and funny enough. It does not only apply to VMware, Microsoft (who was the main promoter for the .local domains in the early days, is now requesting people to change it).

Lastly, since 2014 you are not able to get a public certificate with the .local suffix for the SAN. Cool.

So we can conclude, this is not a Tanzu issue, it’s a tech debt issue and should be addressed

What shall we do next?

We discussed a couple of options in the table.

The first option was to proceed with the workaround specified in the KB article, even though it’s production.
VMware said they would support us, but the problem is that each time you need to perform any lifecycle activity (such as an Upgrade, cluster provisioning or redeploy) you should re-apply manually the workaround.

Pros: It would work, other than that I don’t really see any.
Cons: Requires a LOT of manual effort. The KB said it was not supported for production but GSS said it was.

The second option is the rebuild the contoso.local domain and rename it to something supported and future-proof (contoso.tech for example). Particularly, I preferred this approach since it’s the only way to fix this issue and avoid any further issues.

Pros: It’s a permanent solution
Cons: Takes a LOT of effort, design, planning, and execution would take hundreds of man hours and it has quite a risk associated.

Third option: VMware to the rescue. Thanks to the GSS team for their support and their ideas, we were able to sort out this issue, at least for the short term.
The solution, only applied to Tanzu, was to re-design the Tanzu deployment but using Tanzu Kubernetes Grid Multicloud (TKGm) instead of using vSphere With Tanzu.
Once deployed at the moment of provisioning the TKGm you can run the steps described in KB 83623 and it quotes:

“To workaround this issue you will need to deploy your management and workload clusters with a modified plan that will update the name resolution parameters on the TKG nodes as they are deployed.

Nameservers on vSphere provides the instructions needed for using custom DNS servers on your TKG nodes. These instructions only need one modification to allow for domain names ending in “.local” to be resolved. A “searchDomains” line needs to be added to the end of the vsphere-overlay-dns-control-plane.yaml and vsphere-overlay-dns-workers.yaml files. Once modified, the end of these files should look like the following:

nameservers: [“8.8.8.8”]
searchDomains: [“corp.local”] “

Final thoughts:

You know I’m a big fan of VMware, and most of this blog speaks about my experience with VMware products. I have contact with some people inside VMware and I’ve escalated this particular topic, but after months of reading documentation, taking training, and assisting webinars I never saw a single mention of this limitation of using the .local DNS suffix. It would be nice to have that included in the documentation.

Additionally following on the domain migration topic, I like to explain the situation with this analogy.
My house in Argentina was built 60 years ago, it’s a nice house but a couple of years back we had to upgrade the pipes because it had copper pipes and they were getting rusty, the electrical cabling was designed 60 years ago, where a normal household didn’t have the same amount of appliances or devices that it has today, not even bringing Electrical Vehicles in the picture.

When this contoso.local domain it was built for a purpose, over the course of time the purpose shifted and it expanded, and now would be a good time to revisit the house’s electrical system to make it ready to withstand whatever the future brings our way.

…And Kubernetes for All

December 29, 2020October 3, 2020 by Nacho

Esta semana fue la edición 2020 del VMworld y hubo una palabra que se repitió más que nada: Kubernetes. El año pasado VMware compró Pivotal y a los pocos días anunciaron, Project Pacific, su re-ingeniería de la plataforma.
Este año tuve la suerte de hacer cursos de vRealize Automation 8 y algo que me llamó la atención es que lo que antes eran servicios, ahora se maneja con pods dentro de los appliances.

Cómo hasta ese entonces no había visto mucho de Kubernetes, más que alguna charla y o algo en la facu, me decidí a investigar y “enseñarme” cómo funcionan.

Me gustaría compartirles este glosario con una descripción para que todos puedan entender que es este servicio que llegó para quedarse.
Al final del post, comparto algunos recursos que me sirvieron para aprender y son 100% gratuitos.

¿Qué es Kubernetes?

Kubernetes es una plataforma Open-Source para administrar y orquestar containers que facilita la configuración declarativa y la automatización.

¿Quién desarrolla Kubernetes?

Kubernetes comenzó como un proyecto de Google y lo hicieron Open-Source en 2014, actualmente lo mantiene Cloud Native Foundation y los mayores vendors de tecnología tienen sus implementaciones. Por ejemplo RedHat con Openshift o VMware con Tanzu, entre otros.

Kubernetes en griego significa timonel, por eso el logo es un timón.

¿Por qué Implementar Kubernetes?

Para responder esta pregunta hay que ir un par de años atrás en el datacenter y entender cuáles son las problemáticas que viene a resolver.

Si nos fijamos en la primera etapa: todos los servidores eran físicos (bare metal), es decir, por cada aplicación que manteníamos necesitábamos tener un servidor con sus recursos (CPU/Memoria/NIC’s/Storage), un sistema operativo y sus aplicaciones:
Este modelo aumentaba los costos de hardware, mantenimiento y era poco resiliente a fallas.
Sumado a esto, la instalación y configuración de servidores bare-metal era (y sigue siendo) un proceso tedioso y lento.

Para resolver ese problema llegó la virtualización:
La virtualización permite colocar múltiples máquinas virtuales (VMs) en un solo servidor físico (Hipervisor) esto permite consolidar recursos (CPU, Memoria, Storage) , acelerar tiempos de aprovisionamiento y da un mayor nivel de seguridad ya que una VM no puede acceder a los recursos de otra VM. Cada VM tiene su propio sistema operativo y aplicaciones.

Containers:
Los containers son parecidos a las máquinas virtuales, pero tienen aislamiento flexible, por lo que comparten el sistema operativo entre distintas Apps. Al igual que las VMs los containers tienen su propio CPU, Memoria, tiempo de procesamiento y demás, pero no dependen de la infraestructura, por lo que son portables.

Adicionalmente, dentro de cada container están resueltas todas las dependencias de una aplicación: es decir si yo necesito instalar LAMP debería tener alguna versión de Linux, Apache, MySQL y PHP.
El container se va a asegurar de que las versiones de Apache, MySQL y PHP sea la misma en cualquier lugar que corra el container. Lo que garantiza la portabilidad.
Por último, los containers fueron concebidos con la agilidad en mente, por lo que la automatización es 100% compatible.
Otra cosa importante, como las Máquinas virtuales tienen el hipervisor los containers tienen el Container Engine. En este caso vamos a hablar de Docker.

En este video dan una explicación muy buena del porqué del nombre containers.

Ahora sí, ¿por qué Kubernetes?
Es sabido que todos los servidores pueden fallar, y nuestro trabajo como profesionales de IT es evitar que fallen. Kubernetes está pensado desde esa premisa, sabiendo que nuestros servicios van a fallar, nos da la posibilidad de controlar el cómo se va a comportar nuestra infraestructura cuando fallen.

– Balanceo de Cargas nativo: Podemos exponer un container por IP de servicio o DNS. Si el trafico es alto Kubernetes permite balancear la carga.
– Orquestación de storage: Permite automatizar la provisión de storage a tus containers.
– Rollouts y Rollbacks automáticos: Permite definir manifiestos YAML con el estado deseado de nuestros containers.
– Self Healing: Kubernetes reinicia, reemplaza y mata containers en base a su estado.

Kubernetes es la clave para dejar de tratar a nuestros servidores como Mascotas y empezar a tratarlos como ganado.

¿Cómo funciona Kubernetes?

Cuando implementas Kubernetes tenés un cluster.
El cluster es un conjunto de máquinas workers, o nodos, que va a correr nuestros containers, todos los clusters tienen al menos un nodo. En Kubernetes los nodos ejecutan Pods (que sería un container de containers)
La implementación también tiene un plano de control que supervisa y administra los nodos y los pods.
En entornos productivos, el control cluster tiene varias máquinas master y también múltiples nodos para aumentar la resiliencia del cluster.

Acá dejo un diagrama

Veamos que hace cada componente:

Componentes del Control plane:

Los componentes del control plane toman decisiones sobre los demás nodos de Kubernetes.
Por Ejemplo: definir donde va a correr cada Pod, validar si hay que agregar más replicas, etc.

Estos componentes se instalan en la misma máquina por lo general, y se pueden usar clusters distribuidos como dijimos antes. En estos servidores no se ejecutan containers de usuario.

kube-apiserver:Se encarga de exponer la API de Kubernetes, el frontend del control plane de Kubernetes. Todas las interacciones que tengamos con nuestro cluster de Kubernetes van a ser mediante esta api.

etcd:Es la base de datos del cluster de Kubernetes.

kube-scheduler: Indica en que nodo se van a crear los pods.

kube-controller-manager:
Unifica y administra los procesos del cluster de control.
Algunos procesos que administra son:
Node controller: Monitorea el estado de los nodes y notifica si alguno se cae.
Replication Controller: Es el encargado de monitorear y aplicar la cantidad de replicas correctas para cada pod.

cloud-controller-manager: Te permite conectar tu cluster a un proveedor de servicios en la nube (Google Cloud, Amazon, Azure, etc) y separa los componetnes que interactúan con la nube de los que solo interactúan de forma local. Este componente solo existe en la nube, si tenés un entorno on-premise no lo vas a ver.

Componentes de los nodos:

Nota: los nodos tienen que correr sobre alguna distribución de Linux.

Container Engine: El software que se encarga de correr containers, por ejemplo Docker, containerd, podman, etc.

kubelet: Es un agente que corre en cada nodo del cluster y se asegura que los containers estén corriendo en un pod. Está hablando constantemente con el kube-apiserver para validar que los pods y servicios cumplan con el estado deseado. Adicionalmente ejecuta las acciones que le pasa el apiserver.

kube-proxy: es un proxy de red que corre en cada nodo.
Se encarga de mantener las reglas de red en cada nodo. Esas reglas permiten que los pods puedan tener conectividad fuera del nodo.

Pods: es un grupo de uno o más containers, como me explico mi queridísimo Guille Deprati, es un container de containers.

Algo que a mi me sirvió mucho para entender la arquitectura de Kubernetes es compararlo con lo que entiendo y conozco. Cómo soy administrador VMware me sirve hacer la equivalencia entre Kubernetes y vSphere.

Otro recurso importante es el Container Registry:
Basicamente es un repositorio de imagenes de containers (container images) a partir de las cuales vamos a crear nuestras pods.
Existen los Registros Públicos como ser Docker Hub y podemos tener registros privados dentro de nuestra organización (Red Hat Quay, Harbor, etc). Los cloud providers (AWS, Azure, Google Cloud, Alibaba) probeen su servicio de container registry.

Recursos de estudio:

Los recursos que usé durante estos meses fueron los siguientes:

Kube Academy
Kubernetes.io
Edx – Introduction to Kubernetes
Coursera – Google Cloud Engine offering
Estos son pagos, pero los conseguí gratis por una beca del GCBA.
VMware Cloud Native Apps – You Tube

Espero que te haya servido el post, por favor comentame si te gustó o que se puede mejorar.

Saludos