Skip to content

Networking

If your professional relationship with computers to date has been running processes, maybe using containers, but you've never had to setup servers and their networking before, then allow me to be the first to congratulate you on a career path well trodden, and to apologise that your joyride is over.

Networking is more complicated and anti-human than it needs to be. Consider a simple example. You will have seen an IP address before. Four small numbers with three dots. For example, 10.11.12.13. This address is actually called IPv4. The total number of public Internet IPv4 addresses is relatively small compared to our growing use for them, so networking people invented IPv6. Your innocent soul would be wrong to think that IPv6 addresses look like: 10.11.12.13.14.15.

Instead, the sociopaths involved in computer networking made them look like:

2001:0db8:85a3:0000:0000:8a2e:0370:7334

I will try hard to never discuss IPv6 again.

As a consumer of computers, networking and addressing of computers talking to each other is hidden. google.com in a browser just works. From there you go to other URLs and their pages appear in your browser. It all just works. But you're no longer a consumer of computers. You are a purveyor of fine software systems.

Why Learn Networking?

One of the reasons that I want to share information on networking, is that a common cause of most, "Why doesn't this work for me" requests on the #bosh community Slack channel is networking.

Networking is difficult in part due to two opposing requirements:

Distributed Systems - where different processes want to talk to each other on different servers - requires that the processes can discover each other and can connect to each other.

Security - ensuring a system is only doing what it is supposed to do and is not corrupted or misused by bad actors - requires that we restrict as much access to servers and processes as possible. But not too much so as to stop a distributed system from working.

When processes within a distributed system cannot discover or communicate with its peers, it probably will not work. But the way in each distributed system "probably doesn't work" will be different.

Discovering That a Distributed System is Failing is Nontrivial

A process might refuse to start successfully because it cannot connect to a dependent subsystem. Monit will then restart that process over and over infinitely. Another process might start running even if it cannot access its dependencies, but when users interact with that process it might then return errors. Or it might provide a subset of normal behaviour, rather than explicitly error. Some erroneous behaviour might be intermittent.

Debugging a Distributed System is Nontrivial

It is hard enough for software developers to write software that works at all and provides the features that end users want. The ways in which networking errors can propogate into each application process are more numerous than the "happy path" for the application when there are no networking errors. It will be likely that many distributed systems you work with do not provide a lot of assistance in debugging what networking errors have appeared.

More commonly, inside the error logs of one or more processes will be a variation of a "connection timeout" error or more subtle/invisible indications that a networking issue has appeared.

So hopefully, by learning networking and "owning responsibility" for the networking of your distributed systems, you will limit the scope for accidental networking issues and improve your ability to debug and resolve networking issues.

BOSH Does Not Configure Networks

Whilst, BOSH can provision cloud servers and persistent disks, it cannot provision/change networking. Either your networking will have been setup at the time your cloud infrastructure was setup, such as a vSphere environment; or you will use additional tools to provision and manage networking in cloud infrastructures such as AWS, GCP, or Microsoft Azure.

When we discuss using BOSH in different infrastructures, we will look at some specific aspects of setting up networking before using BOSH.

Example Networking in Cloud-Config

To help frame this mini-guide to networking, I'll first introduce where networking configuration appears within bosh cloud-config and deployment manifests.

Networking Configuration in Cloud-Config

Remember that bosh cloud-config is where the bulk of cloud infrastructure specific configuration resides. Earlier we reviewed vm_types and mapped each one to a CPI specific instance type/machine type/VM configuration depending on the CPI.

Networking configuration is also specific to the target infrastructure, but fortunately there is a common set of attributes across all CPIs.

Look for the subtle differences in each example.

Here is an example of describing networking for a vSphere environment:

networks:
- name: default
  type: manual
  subnets:
  - range: 10.0.0.0/24
    gateway: 10.0.0.1
    static: [10.0.0.5-10.0.0.20]
    azs: [z1,z2,z3]
    dns: [8.8.8.8]
    cloud_properties:
      name: net-10-0-0-0

azs:
- name: z1
  cloud_properties:
    datacenters:
    - clusters: [mycluster: {}]
- name: z2
  cloud_properties:
    datacenters:
    - clusters: [mycluster: {}]
- name: z3
  cloud_properties:
    datacenters:
    - clusters: [mycluster: {}]

Here is an example of describing networking for a GCP environment:

networks:
- name: default
  type: manual
  subnets:
  - range: 10.0.0.0/24
    gateway: 10.0.0.1
    static: [10.0.0.220-10.0.0.254]
    dns: [8.8.8.8]
    cloud_properties:
      network_name: mynetwork
      subnetwork_name: mysubnetwork
      ephemeral_external_ip: true
      tags: [tag1, tag2]

azs:
- name: z1
  cloud_properties:
    zone: europe-west1-b
- name: z2
  cloud_properties:
    zone: europe-west1-c
- name: z3
  cloud_properties:
    zone: europe-west1-d

Here is an example of describing networking for an AWS VPC environment:

networks:
- name: default
  type: manual
  subnets:
  - range: 10.0.0.0/24
    gateway: 10.0.0.1
    static: [10.0.0.220-10.0.0.254]
    reserved: [10.0.0.0/30]
    dns: [8.8.8.8]
    cloud_properties:
      subnet: subnet-123456
      security_groups: [bosh]
- name: elastic
  type: vip

azs:
- name: z1
  cloud_properties:
    availability_zone: eu-west-2a
- name: z2
  cloud_properties:
    availability_zone: eu-west-2b
- name: z3
  cloud_properties:
    availability_zone: eu-west-2c

There is a lot going on at first glance. Look at the three examples a few times and you'll start to see some patterns.

The common structure looks like:

networks:
- name: some-name-used-by-deployment-manifests
  type: manual
  subnets:
  - range: 10.0.0.0/24
    gateway: 10.0.0.1
    static: [10.0.0.220-10.0.0.254]
    dns: [8.8.8.8]
    cloud_properties: {...}

azs:
- name: z1
  cloud_properties: {...}
- name: z2
  cloud_properties: {...}
- name: z3
  cloud_properties: {...}

I promise that everything you've seen in these example cloud-config will make sense. This is all learnable and understandable.

Networking Configuration in a Deployment Manifest

Consider this abbreviated zookeeper.yml deployment manifest:

instance_groups:
- name: zookeeper
  azs: [z1, z2, z3]
  instances: 5
  vm_type: default
  networks:
  - name: default

When the abbreviated zookeeper-release/manifests/zookeeper.yml was first introduced in Deployment manifests, part 1 above, the azs and networks attributes were omitted.

Each instance_groups item must include an azs and networks attribute. At a glance you can see that the azs values correspond to the azs from the sample cloud-config above, and the networks name default corresponds to one of the cloud-config networks items.

This deployment manifest is inferring that each of the 5 instances will be allocated a different IP address from the default network. It isn't important what exact IP address will be assigned, as the job templates running on each instance will be provided the IP addresses for all the instances in the instance group.

It is useful to understand that from the sample cloud-config we can see that these IP addresses might be in the range of 10.0.0.2 to 10.10.0.219. Let's investigate IP ranges and how BOSH allocates IP address(es).

Mapping to External Network

As stated earlier, BOSH does not provision or manipulate the cloud infrastructure networking. Instead, the networking section in cloud-config describes the available networking space that BOSH can use for its cloud servers.

The examples above are each stating that there already exists a network 10.0.0.0/24 that the BOSH director and CPI has access to on their respective infrastructure. Your cloud-config will need to specifically describe your own network availability.

BOSH IP Allocation vs DHCP

If you've played with networking before - such as trying to get your computer and devices onto your home router - you'll have blissfully ignored how an IP address is allocated to your computer or device. This facility is thanks to Dynamic Host Configuration Protocol (DHCP).

You might have seen that mysterious IP 169.254.X.Y that indicates that DHCP has failed and your device has allocated itself an IP address.

The allocation of IP addresses to BOSH instances is not performed with DHCP. Instead, the BOSH director/CPI statically assign IP addresses to each instance. From our perspective, the net result is the same: IP address allocation is not manually managed by you (by default).

Instead, you will give guidance to the BOSH director and CPI as to where each instance will be placed within your networking range, and an IP will be chosen.

Consider an example network in a cloud-config:

networks:
- name: default
  type: manual
  subnets:
  - range: 10.0.0.0/24
    gateway: 10.0.0.1
    static: [10.0.0.220-10.0.0.254]
    reserved: [10.0.0.0/30]
    dns: [8.8.8.8]
    cloud_properties:
      subnet: subnet-123456
      security_groups: [bosh]

The name default will be used in each deployment manifest to describe "where" its instances will be placed. In this AWS example, the AWS CPI will use subnet: subnet-123456 to place the instance within that specific subnet ID.

The IP address will be selected from the available IPs within the range range, but not including the static ranges, nor reserved ranges.

BOSH allows three formats for describing a range of IPs:

  • single IP address - 10.0.0.220
  • range of IP addresses - 10.0.0.220-10.0.0.254
  • CIDR notation - 10.0.0.0/30

CIDR Notation

BOSH will select an available IP address from the range 10.0.0.0/24. This notation for a range of IP addresses is called, Classless Inter-Domain Routing (CIDR). It is short hand for "all IPs between 10.0.0.0 and 10.0.0.255".

CIDR calculators can be your friend to learn and experiment with CIDR notations.

Some examples:

  • 10.0.0.0/30 - 4 IPs, from 10.0.0.0 to 10.0.0.3
  • 10.0.0.0/29 - 8 IPs, from 10.0.0.0 to 10.0.0.7
  • 10.0.0.0/28 - 16 IPs, from 10.0.0.0 to 10.0.0.15
  • 10.0.0.0/24 - 255 IPs, from 10.0.0.0 to 10.0.0.255
  • 10.0.0.0/16 - 65536 IPs, from 10.0.0.0 to 10.0.255.255

A useful nuance of CIDR notation is that we round down the base IP address, so the following examples are equivalent:

  • 10.0.0.0/30 - 4 IPs, from 10.0.0.0 to 10.0.0.3
  • 10.0.0.1/30 - 4 IPs, from 10.0.0.0 to 10.0.0.3
  • 10.0.0.2/30 - 4 IPs, from 10.0.0.0 to 10.0.0.3
  • 10.0.0.3/30 - 4 IPs, from 10.0.0.0 to 10.0.0.3

In some BOSH files, you might see this nuance used within operator files, such as this sample AWS cloud-config template:

networks:
- name: default
  type: manual
  subnets:
  - range: ((internal_cidr))
    gateway: ((internal_gw))
    reserved: [((internal_gw))/30]

The ((internal_gw)) variable is used to describe the gateway to the subnet range. If ((internal_gw)) is 10.0.0.1, then the reserved: [((internal_gw))/30] effectively evaluates to reserved: [10.0.0.0-10.0.0.3]. We will properly introduce BOSH Operator files and Variables soon.

Gateway

The gateway is the IP address that your BOSH instances will use to communicate with the Internet or beyond its network it will send it to the gateway IP address. Data arriving from outside the network will arrive via this gateway IP address.

Typically the gateway IP is the base IP address of the range plus 1. A network range of 10.11.0.0/16 will typically have a gateway of 10.11.0.1.

Reserved Ranges

When BOSH is dynamically allocating IPs for instances it needs to know any IP addresses that it cannot use. In the cloud-config, this is an attribute called reserved. It is an optional array of IP ranges.

Some cloud infrastructures reserve IP addresses for every network. AWS reserves the first four IP addresses for itself. Therefore, an AWS cloud-config will always have at least one reserved range.

Sometimes it might be that there is one common infrastructure network that must be split into multiple virtual BOSH networks, perhaps within a single cloud-config or perhaps across multiple BOSH directors (you might have a BOSH director for staging deployments and another for production deployments but yet only have one common network for all deployments to share). You will use reserved ranges to manually split the underlying network into virtual networks.

Explicit Static IP Address

For the most part, you should not feel you need to declare IP addresses in advance. Job templates have a mechanism for discovering the IP addresses for each other, called Links, which will be introduced soon.

There are occasions where you may want to share the IP addresses of instances in an instance group with external clients. One method will be to statically declare specific IP addresses in the deployment manifest that BOSH will promise to assign to the instances.

There are two systems of pre-defined IP addresses that you can consider.

Manual Static Addresses

Rather than delegate the selection of IP addresses to BOSH, your deployment manifest can explicitly request specific IP addresses for each instance in an instance group. We add the attribute static_ips to our instance group's networks section.

The abridged manifest for an instance group with static IPs might be:

instance_groups:
- name: zookeeper-instances
  instances: 5
  networks:
  - name: default
    static_ips:
    - 10.0.0.220
    - 10.0.0.221
    - 10.0.0.222
    - 10.0.0.223
    - 10.0.0.224

Since instances is 5, therefore the static_ips list must include five IP addresses.

The static_ips IP addresses must also be declared within the cloud-config. In the examples included earlier and below, they each had a static: [10.0.0.220-10.0.0.254] attribute. These IP addresses - between 10.0.0.220 and 10.0.0.254 - will not be dynamically assigned to other instances; they can only be used by deployment manifests declaring static_ips.

When we introduce Links, you will see that there are relatively few reasons for requiring static IPs; and your life is a lot simpler without having to manually select IP addresses for the benefit of external systems.

Virtual IP Addresses

Most cloud infrastructures have a concept of virtual IP addresses. In OpenStack, they are called floating IPs, in AWS they are called elastic IPs, and in GCP they are called static external IPs. Virtual IPs are public IP addresses that you lease or borrow from your cloud provider, for example elastic IPs are leased from AWS. Amongst other things, virtual IPs are useful for providing some predictability in the network reachability of cloud instances - because for one thing you don't lose them as instances - stop, restart or outright fail. In other words virtual addresses are not ephemeral in nature and can be moved between virtual machines. Since they belong to an organization after allocation (until released) and are not ephemeral, things like firewall rules, network ACLs, and DNS records can be created for them and will not need to be changed even if the VM they point to fails.

Like networks in general, BOSH cannot provision or destroy virtual IP addresses. Instead, it can help you manage the assignment of them to instances.

For example, you might provision 5 elastic IPs on AWS and be told that their values are 54.1.2.3, 56.2.3.4, 58.4.5.6, 123.1.2.3, and 124.2.3.4. We can then ask BOSH to assign them to each of our 5 zookeeper instances:

instance_groups:
- name: zookeeper-instances
  instances: 5
  networks:
  - name: default
    default: [dns, gateway]
  - name: elastic
    static_ips:
    - 54.1.2.3
    - 56.2.3.4
    - 58.4.5.6
    - 123.1.2.3
    - 124.2.3.4

This deployment manifest example for AWS uses two networks, which map to the two networks from the AWS cloud-config example earlier by their names default and elastic:

networks:
- name: default
  type: manual
  subnets:
  - range: 10.0.0.0/24
    ...
- name: elastic
  type: vip

BOSH supports three different network types, discussed in the next section.

Network Types

You've now seen enough examples of deployment manifests and cloud-config to be introduced to BOSH networks from the official documentation.

A BOSH network is an IaaS-agnostic representation of the networking layer. The Director is responsible for configuring each deployment job’s networks with the help of the BOSH Agent and the IaaS. Networking configuration is usually assigned at the boot of the VM and/or when network configuration changes in the deployment manifest for already-running deployment jobs.

There are three types of networks that BOSH supports:

  • manual: The Director decides how to assign IPs to each job instance based on the specified network subnets in the deployment manifest
  • vip: The Director allows one-off IP assignments to specific jobs to enable flexible IP routing (e.g. elastic IP)
  • dynamic: The Director defers IP selection to the IaaS

We've seen type: manual and type: vip in preceding sections.

The third type: dynamic is used with AWS original networking, and legacy OpenStack Nova networking. I personally liked these simpler "flat" networking systems. You requested a BOSH instance, and the underlying networking allocated you the IP. You didn't need to create networks and other networking infrastructure, nor did your BOSH manifests/cloud-config need to specify CIDR network ranges, reserved + static ranges. Simpler times.

In production environments, you will wish for the security features and control from more complex networking. Or someone in your organisation will do this wishing for you.

If you are using AWS today, you will instead use VPC networking, and with OpenStack you will use Neutron networking. These are both represented in BOSH cloud-config as type: manual networking.

Manual Networks with AWS VPC

Consider an AWS VPC with a single subnet:

aws-subnet

The basic cloud-config configuration would be:

azs:
- name: az1
  cloud_properties: {zone: us-west-2a}
- name: az2
  cloud_properties: {zone: us-west-2a}
- name: az3
  cloud_properties: {zone: us-west-2a}

networks:
- name: default
  type: manual
  subnets:
  - range: 10.0.0.0/24
    gateway: 10.0.0.1
    reserved: [10.0.0.1/30]
    static: [10.0.0.10-10.0.0.19]
    azs: [az1, az2, az3]
    cloud_properties:
      subnet: subnet-4cff3507
- name: external
  type: vip

Note: In AWS, 1 subnet = 1 AWS availability zone. Above, all three of the bosh AZs that we defined are within the same subnet, and therefore also within the same AWS availability zone. For the most resiliant deployment, all three cloud-config defined AZs should be in different AWS availability zones.

Manual Networks with GCP VPC

Consider a GCP VPC with a single subnet:

gcp-vpc-networks-bosh

The basic cloud-config configuration would be:

azs:
- name: z1
  cloud_properties: {zone: us-east1-d}
- name: z2
  cloud_properties: {zone: us-east1-d}
- name: z3
  cloud_properties: {zone: us-east1-d}

networks:
- name: default
  type: manual
  subnets:
  - range: 10.0.0.0/24
    gateway: 10.0.0.1
    reserved: [10.0.0.1/30]
    static: [10.0.0.10-10.0.0.19]
    dns: [8.8.8.8]
    azs: [z1, z2, z3]
    cloud_properties:
      ephemeral_external_ip: true
      network_name: bosh
      subnetwork_name: bosh-us-east1
      tags: [internal, no-ip, windows-rdp]
- name: external
  type: vip

The dns: [8.8.8.8] configures all VMs to use Google's public DNS servers. These public DNS servers are popular outside of GCP as well as for your BOSH deployments within GCP.

The tags are a selection of the available firewall rules and network routes. If the tag matches a firewall rule, then that firewall rule is applied to each instance in that network. For example, the windows-rdp tag above matches the firewall rule with the same name below that was preconfigured:

gcp-vpc-networks-firewall-rules

Other tags are used to identify which networking routes to apply to each instance in the network. The no-ip tag in the example above corresponds to the following predefined route in my Google Compute environment. The "next hop" of the route is to the "nat-instance-primary" server which will provide my instances with outbound, public Internet access:

gcp-vpc-networks-route-details

In GCP, the type: vip IP addresses are called, "External IP addresses". For this reason, I've named the network external. It is also common for the type: vip network to be generically named vip.

Manual Networks with OpenStack Neutron

TODO

Manual Networks with vSphere

TODO

Further Reading on BOSH Networks

The BOSH documentation has some additional information about networking and configuration options that is well worth reading now and again later for reference.