Dissecting the replication controller configuration file

The metadata is pretty minimal, with just a name (labels are not required):

apiVersion: v1 
kind: ReplicationController 
metadata: 
  name: cassandra 
  # The labels will be applied automatically 
  # from the labels in the pod template, if not set 
  # labels: 
    # app: Cassandra

The spec specifies the number of replicas:

spec: 
  replicas: 3 
  # The selector will be applied automatically 
  # from the labels in the pod template, if not set. 
  # selector: 
      # app: Cassandra

The pod template's metadata is where the app: Cassandra label is specified. The replication controller will keep track and make sure that there are exactly three pods with that label:

template: 
    metadata: 
      labels: 
        app: Cassandra

The pod template's spec describes the list of containers. In this case, there is just one container. It uses the same Cassandra Docker image named cassandra and runs the run.sh script:

spec: 
  containers: 
    - command: 
        - /run.sh 
      image: gcr.io/google-samples/cassandra:v11 
      name: cassandra

The resources section just requires 0.5 units of CPU in this example:

 resources: 
            limits: 
              cpu: 0.5

The environment section is a little different. The CASSANDRA_SEED_PROVDIER specifies the custom Kubernetes seed provider class we examined earlier. Another new addition here is POD_NAMESPACE, which uses the Downward API again to fetch the value from the metadata:

  env: 
    - name: MAX_HEAP_SIZE 
      value: 512M 
    - name: HEAP_NEWSIZE 
      value: 100M 
    - name: CASSANDRA_SEED_PROVIDER 
      value: "io.k8s.cassandra.KubernetesSeedProvider" 
    - name: POD_NAMESPACE 
      valueFrom: 
         fieldRef: 
           fieldPath: metadata.namespace 
    - name: POD_IP 
      valueFrom: 
         fieldRef: 
           fieldPath: status.podIP

The ports section is identical, exposing the intra-node communication ports (7000 and 7001), the 7199 JMX port used by external tools, such as Cassandra OpsCenter, to communicate with the Cassandra cluster, and of course the 9042 CQL port, through which clients communicate with the cluster:

  ports: 
    - containerPort: 7000 
      name: intra-node 
    - containerPort: 7001 
      name: tls-intra-node 
    - containerPort: 7199 
      name: jmx 
    - containerPort: 9042 
      name: cql

Once again, the volume is mounted into /cassandra_data. This is important because the same Cassandra image configured properly just expects its data directory to be at a certain path. Cassandra doesn't care about the backing storage (although you should care, as the cluster administrator). Cassandra will just read and write using filesystem calls:

volumeMounts: 
  - mountPath: /cassandra_data 
    name: data

The volumes section is the biggest difference from the stateful set solution. A stateful set uses persistent storage claims to connect a particular pod with a stable identity to a particular persistent volume. The replication controller solution just uses an emptyDir on the hosting node:

volumes: 
  - name: data 
    emptyDir: {}

This has many ramifications. You have to provision enough storage on each node. If a Cassandra pod dies, its storage goes away. Even if the pod is restarted on the same physical (or virtual) machine, the data on disk will be lost because emptyDir is deleted once its pod is removed. Note that container restarts are OK because emptyDir survives container crashes. So, what happens when the pod dies? The replication controller will start a new pod with empty data. Cassandra will detect that a new node was added to the cluster, assign it some portion of the data, and start rebalancing automatically by moving data from other nodes. This is where Cassandra shines. It constantly compacts, rebalances, and distributes the data evenly across the cluster. It will just figure out what to do on your behalf.

Table of Contents for Dissecting the replication controller configuration file

Create new playlist

Sign In

Sign Up

Table of Contents for
Dissecting the replication controller configuration file