Setting up MySQL Aurora replication from an external primary with GTIDs

Setting up Amazon Aurora as a replica of an external MySQL primary is a common way of synchronizing and/or migrating self-managed MySQL databases to RDS/Aurora.

Nowadays, using GTIDs for MySQL is the preferred way of doing replication, for example, because it offers features such as auto-position and thus makes it easy to change replication topologies.

According to the AWS docs and also AWS blogs, it’s possible to use GTID for replication from an external primary into Aurora.

But there’s one important issue which is not covered by the docs or blogs, and this is setting the gtid_executed / gtid_purged variables, which new MySQL replicas usually need GTIDs, so they know their initial binlog position. This value is not set in Aurora/RDS, at least when restoring from a Xtrabackup on S3. Also, RDS/Aurora does not allow setting this value:

1
2
3
mysql> SET GLOBAL gtid_purged='<gtid_string_found_in_xtrabackup_binlog_info>';

Access denied; you need (at least one of) the SUPER privilege(s) for this operation

When I started the replication without the GTID set, it immediately stopped working with obscure errors, since it was apparently starting to replicate events from the primary at a random position, such as:

1
2
3
4
5
mysql> show slave status\G
*************************** 1. row ***************************
...
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table sampletable; Duplicate entry '102688557' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000001, end_log_pos 1099

But, AWS Support to the rescue, knew a workaround by setting the gtid_executed value directly in the mysql database:

1
mysql> insert into mysql.gtid_executed(source_uuid,interval_start,interval_end) values('5f70944c-9bbe-11e9-a9d2-0a75ff943724',1,19);

Note: This also works with a set of multiple GTIDs. Just insert more rows)

Now, reboot the DB instance. Check the value has been set into the gtid_purged variable:

1
mysql> show variables like 'gtid_purged';

And, if it looks correct, start the replication:

1
2
3
mysql> CALL mysql.rds_reset_external_master;
mysql> CALL mysql.rds_set_external_master_with_auto_position ('..., '3306', username, passwordX', 0);
CALL mysql.rds_start_replication();

Note: These commands are for Aurora 2/MySQL 5.7. Aurora 3 / MySQL 8 have naming changes to a more inclusive language, the commands look slightly different.

After setting the correct initial value for the gtid_executed value, the replication should be running smoothly and catching up.

Like what you read?

You can hire me or make a donation via PayPal!

Lessons learned when restoring a MySQL Aurora RDS database from S3/Percona Xtrabackup

Recently I was trying to restore a Aurora database from an Percona xtrabackup, the de-facto industry standard for backing up self-managed MySQL databases. Luckily, RDS and Aurora natively support restoring a cluster from Percona xtrabackups. This comes very handy for migrations of big databases (For more information, check out the docs and this prescriptive guidance article from AWS).

But soon I was stuck with this error message:

1
Failed to migrate from mysql 5.7.38 to aurora-mysql 5.7.mysql_aurora.2.11.0. Reason: Migration has failed due to issues. Common problems include issues with the underlying storage or corruption in the database. Disabled/Deleted source kms key for migration from encrypted source. Try taking another snapshot and migrating again.

I contacted AWS support and luckily got a very knowledgeable contact person (thanks, JP!). They found out that the error message is misleading, since I was not restoring from a 5.7.38 backup, but from 5.7.41.

In the end, the problem was an incompatible xtrabackup format with a combination of xbstream and xbcloud. The error message probably (misleadingly) says “5.7.38” since this is the current default minor version for RDS/MySQL 5.7.

A plain xtrabackup, split in chunks, worked for me:

1
$ xtrabackup --backup --stream=tar --target-dir=/tmp/dumpidump | gzip - | split -d --bytes=500MB - /tmp/dumpidump/backup.tar.gz

In fact, all methods listed in the documentation work fine.

After backing up, the backup has to be moved to an S3 bucket:

1
aws s3 cp /tmp/dumpidump/ s3://my-database-dumps/aurora-$(date +%s)/ --recursive

Then, one can create an Aurora cluster from that backup in S3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
DB_CLUSTER_IDENTIFIER="aurora-$(date +%s)"

aws rds restore-db-cluster-from-s3 \
--db-cluster-identifier $DB_CLUSTER_IDENTIFIER \
--master-username admin \
--master-user-password … \
--engine aurora-mysql \
--engine-version 5.7.mysql_aurora.2.11.0 \
--source-engine mysql \
--source-engine-version 5.7.41 \
--s3-bucket-name my-database-dumps \
--s3-prefix manual/aurora-1676451099 \
--vpc-security-group-ids sg-... \
--s3-ingestion-role-arn … \
--db-subnet-group-name … \
--db-cluster-parameter-group-name … \

aws rds create-db-instance \
--db-cluster-identifier $DB_CLUSTER_IDENTIFIER\
--db-instance-identifier $DB_CLUSTER_IDENTIFIER \
--db-instance-class db.t4g.medium \
--engine aurora-mysql

Also make sure that the s3-ingestion-role-arn has sufficient permissions on the S3 bucket, and if used, on the KMS key.

Restoring to a plain RDS MySQL instance

During the above debugging process, I also tried out restoring into a plain MySQL RDS (which can be migrated to Aurora later on), but got another error message:

1
An error occurred (InvalidParameterValue) when calling the RestoreDBInstanceFromS3 operation: You must request the most recent preferred minor version. Please request mysql5.7.38

If you get this error, your source engine version is probably TOO NEW. Plain MySQL RDS cannot restore backups created from newer versions than the current default minor version (currently 5.7.38). I tried to create a backup from 5.7.41, but that version was too new. You would have to downgrade the backup source, or migrate to Aurora (which can also handle newer minor versions). IN general, restoring into plain MySQL RDS, has many more limitations, e.g. it’s not possible to restore from S3 files which are KMS-encrypted (default S3 encryption works, though).

When googling for the errors listed here, I got nothing useful, so I hope my article might point some folks on the right path.

Like what you read?

You can hire me or make a donation via PayPal!

Serverless - a new model to run backend software

For me, serverless has become the default model of developing backend applications.
However, a meeting a few weeks ago was a reminder for me that not the entire world is embracing serverless applications already: I had the task to design an “Energy Data Importer”, an application that collects time series data from different locations, sanitizes and transforms them and stores them in a database for later lookup. So I went ahead and sketched the application with AWS serverless building blocks and patterns.

I also created a prototype with the CDK with a working integration/E2E test which included authorization with Cognito Userpools, data upload with S3 presigned URL, event-driven Lambda which transforms the data with Python pandas and the aws-datawranger, and writes the data into an Amazon Timestream time series database. Overall, the proof of concept infrastructure and application code had 200 lines of code. And of course with all the advantages which come mostly for free with serverless applications: built-in resilience, high availability, security, elasticity/auto-scaling, cost-effectivity (no pay for idle), near-zero total cost of ownership, and so on. And also with the advantages of IAC: repeatable deployments etc.

I presented the architecture to three people that apparently had never seen serverless architectures before, and they seemed somehow overwhelmed and skeptical. The main three concerns/remarks I remember were:

  • “But where is the code? Where can I click through the code?”
  • “All the complexity is now in the infrastructure, not in my code.”
  • “This is an extreme example!”

Let’s go through them one by one:

“But where is the code? Where can I click through the code?”

Serverless applications have by definition usually less code, since much self-written custom logic or libraries is replaced by managed services. Having less custom code or libraries results in less “clickable” or scrollable code in the IDE. That might make developers feel uncomfortable, since it feels like losing control: They cannot look up the exact behavior anymore. And that’s true, you have to trust your cloud provider that their documentation is comprehensive and up-to-date, and the software and APIs behave as described.

One simple example of this case is defining HTTP routing logic in the application (e.g. via annotations) vs. defining it in the infrastructure, e.g. via load balancer or API gateway rules. In e.g. Spring Boot or Express.JS, developers can look into the routing logic of the open source framework. They cannot click through code anymore if the routing logic is moved to the infrastructure, such as a load balancer or API gateway.

It can also be more difficult to integration-test serverless applications since they have to leave the framework-land, and the integration testing becomes more end-to-end, because testing now also involves the infrastructure such as load balancers.

On the other hand, one could argue, why needs a developer to click through glue code like routes etc.in the first place. Why are they concerned with this stuff and not with working on the main business logic?

Not writing and being able to scroll through that many indirections, abstractions, factories, etc. - but still having a working application - might reveal to developers that they might have been overengineering the past and/or overestimated the importance of their “craft”. Serverless architectures with less code often make the unimportance of owned code and libraries more visible, and that might be a bitter pill to swallow for some developers.

“All the complexity is now in the infrastructure, not in my code!”

Since serverless applications are mostly the combination of managed services, much of the “complexity” is now moved to (or reduced by) the infrastructure, and the combination of managed services.

Business logic and infrastructure logic have been mixed together for a long time.
Applications cannot live with their infrastructure, may it be databases, queues, load balancers, storage, message buses, or email services. For example, a pub/sub-rule whether a subscriber gets certain business events IS business logic AND also “infrastructure code”!

Used to configure XML files or annotations to configure your app? That’s also business logic as “infrastructure as/from code”. But it might feel different whether the rules are in the application code repository, or in the infrastructure code repository (many businesses have separate code repositories for applications and infrastructure).

Serverless and IaC tools like CDK make the fact more visible that application and infrastructure code belong together, and that business logic is also part of the infrastructure complexity.

Using this as an argument against serverless feels a bit like “shooting the messenger” or blaming it as the bearer of “bad” news that there is no working software without infrastructure.

Moving complexity into the infrastructure, and not in the own code, is actually usually a very good thing, since code is a liability and not an asset: Not owning the entire lifecycle of software and libraries (install, maintain, debug, keep up-to-date, deploy, monitor/instrumentation) frees companies from undifferentiated heavy lifting and helps them keep focus on customers and business.

A co-worker once noted: It’s like the “conversation of energy principle”, but for complexity: The complexity is still there, but on the other side of the (cloud provider) API.

“This is an extreme example!”

This comment was related to the amount of usage of managed services and libraries which resulted in about 200 lines of code for the entire proof of concept including application AND infrastructure code + 60 lines for the integration test. While the Proof of concept did not contain the entire application with all requirements, it showcased a good amount of the application.

With serverless, infrastructure and “application code” become one and this melange contains the business logic. The use of managed services reduces the amount of own/owned code and libraries.

Less code and no borders between application and infrastructure code can scare people, since it is an entirely new way of developing and running modern applications in the cloud to them. If done properly, it also has to result in shifting of roles, responsibilities, skills - and thus lead to re-organizations.

One reason why technologies such as VMWare, Kubernetes etc. are so popular is that organizations and people don’t like change / to be changed. These bridging technologies might bring a “cloudy” feeling. But if most of the existing knowledge and behaviors stay the same, no real change is introduced.

Which closes the circle from me back to my research from 2017 to 2019 regarding “Serverless vs. Organizations” and Organizational (Un)-Learning.

Like what you read?

You can hire me or make a donation via PayPal!

🤔 The serverless test

There is a lot of buzz and marketing around the term “serverless”. This article aims to dismantle what’s serverless and what not - with practical examples.

Definition

So first, we need a definition of serverless. A good start might be Wikipedia:

Serverless computing is a cloud computing execution model in which the cloud provider allocates machine resources on demand, taking care of the servers on behalf of their customers. […] However, developers of serverless applications are not concerned with capacity planning, configuration, management, maintenance, fault tolerance, or scaling of containers, VMs, or physical servers. […] When an app is not in use, there are no computing resources allocated to the app. Pricing is based on the actual amount of resources consumed by an application.[1] It can be a form of utility computing.

This definition is very tightly coupled to compute, while a backend application usually consists of many more components like storage, queues, databases, event busses, load balancers, etc.

Paul Johnston, one of the fathers of ServerlessDays, has a more broad definition, which doesn’t only apply to compute, but to software solutions in general.

A Serverless solution is one that costs you nothing to run if nobody is using it (excluding data storage).

The inherent essence of this definition is that every building block of a serverless solution MUST scale to zero (or isn’t billed) if not used. Data storage is an exception since it’s usually not wanted that data are wiped automatically if not used, unless it’s a cache.

So let’s have a look at use cases or Public cloud services and see if they meet the serverless definition above:

Product / use case is it serverless? why?
Amazon Elastic Kubernetes Service (EKS) $40 base fee / month
Azure AKS 🤔 No base fee. But also apparently useless without worker nodes
Google Kubernetes Engine (GKE) $74 base fee / month
Knative 🤔 Depends:
  1. is there a base fee for the Kubernetes Cluster (as with EKS and GKE)
  2. can the scheduler auto-scale nodes to 0?
Serverless Aurora v1 Can scale to zero
Serverless Aurora v2 Despite it’s name, it’s not serverless, since it cannot scale to zero and costs $43/month for idle.
Kinesis Data Streams Minimum 1 shard, $29/month
Kinesis Data Streams On Demand On-demand, scales to zero
AWS, SQS, SNS, EventBridge On-demand, pay-per-use
DynamoDB Standard Pricing based on provisioned capacity units
DynamoDB on demand Per request pricing
AWS Elastic Beanstalk The ELB/ALB/NLB load balancer has a base fee
AWS App Runner App Runner scales to zero
AWS ECS+Fargate While Fargate can scale to zero, a E/A/NLB load balancer is most likely used.
Google App Engine Standard scales to 0
Google App Engine Flexible Despite its name, it’s not that flexible and needs VMs
Akamai Edge Functions 🤔 Need to contact sales

I will add more examples as they cross my attention.

Ok, at this point you might be like “So what?”. In another article, we will have a look at why scaling to zero is actually a very important and good architectural characteristic.

Like what you read?

You can hire me or make a donation via PayPal!

Advantages of AWS Multi-Account Architecture

When we begin doing some things in AWS, we usually start with a single AWS account and create our AWS resources in it. And things can become a mess very fast. This article should give you an overview why you should switch to a using multi-account architecture very soon for workloads on AWS.

Hard limits per AWS Account

AWS has many “hard limits” per AWS Account, which means that - in contrast to soft limits - they cannot be increased. Having multiple AWS Accounts reduces the probability of hitting one of them. There are a few things annoying then having a failing deployment because you hit e.g. the maximum number of EC2 instances per account while rotating autoscaling-groups.

“Blast radius” reduction

One of the most important reasons for separating workloads into several distinct AWS accounts is to limit the so called blast radius. It means to contain issues, problems or leaks by design so that only one portion of the infrastructure is affected when things go wrong and to prevent them from leaking / cascading into other accounts.

AWS accounts are logically separated: No AWS account or resource in it can access resources of other AWS accounts by default. Cross-account access is possible, but it has always to be granted in an explicit way, e.g. by granting permissions through IAM or other mechanism specific to an AWS service.

AWS Per-Account Service and API limits

AWS throttles API access on an per-account basis: So for example imagine some script of Team A is e.g. hammering the EC2 API could result in another Team B’s production deployment to fail, if they are in the same AWS account. Finding the cause could be hard or even impossible for Team B. They might even see themselves forced to add retries/backoff to their deployment scripts which further increases load and even more throttling! And last but not least, it adds accidental complexity to their software.

Additionally there are also service and resource limits per AWS account. Some of them can be raised, some can’t. the probability of running into one of these limits declines if you distribute your AWS resources across several AWS Accounts.

Security

Maybe you remember the startup Code Spaces which had all their resources in one AWS account including backup: they got hacked and entirely vaporized within 12 hours.
I would argue that this scenario would have happened less likely if their backups resided in another AWS account.

Environment separation

Typed in DROP DATABASE into the wrong shell? Oops, production is gone! That’s actually common story, you might as well remember this GitHub outage (not directly related to AWS, but similar contributing factors).

Consider separating e.g. test, staging and production environments into own AWS accounts to reduce the blast radius.

IAM is complicated

IAM is not very easy to grasp and even today there seems to be no easy way to follow the Principle of Least Privilege in IAM, I’d say that Managed Policies are a good start, but too often I see myself falling back to assign AdministratorAccess. So we often tend to give away too many permissions to e.g. IAM roles or IAM users.

By separating workloads into their own AWS Accounts, we once again reduce the blast radius of too broad permissions to one AWS account - by design.

Map AWS Accounts to your organizational structure

Companies usually try to break down the organization into smaller autonomous subsystems. A subsystem could be an organizational unit/team or a product/project team. Thus, providing each subsystem their AWS account seems to be natural. It allows teams to make autonomous decisions within their AWS account and reduce communication overhead across subsystem borders as well as dependencies on other subsystems.

The folks from scout24 though issued a warning on mapping AWS accounts 1:1 to teams:

The actual sizing and assignment of accounts is not strictly regulated. We started creating accounts for teams but quickly found out that assigning accounts per product or group of related services makes more sense. Teams are changing members and missions but products are stable until being discontinued. Also products can be clearly related to business units. Therefore we introduced the rule of thumb: 1 business unit == 1 + n accounts. It allows to clearly identify each account with one business unit and gives users freedom to organize resources at will.

I can definitely fully sign that statement as I have seen it many times that teams are splitting and merging or are constantly getting reorganized. This is especially true in companies who think they are agile and try to fix deeper systemic problems by constantly reorganizing people and teams, ignorant of Conway’s Law or their technical constraints / heritage.

Exploring your company’s Bounded Contexts might be another method to find the right sizing and slicing.

Never slice AWS accounts by teams or org units - but rather by Bounded Context, product, purpose or capability.

Making implicit resource sharing harder by design

I guess almost everyone can tell a story of one big database in the middle, and tons of applications sharing it (Database Integration).

Sam Newman brings it to the point in “Building Microservices”:

Remember when we talked about the core principles behind good microservices? Strong cohesion and loose coupling — with database integration, we lose both things. Database integration makes it easy for services to share data, but does nothing about sharing behavior. Our internal representation is exposed over the wire to our consumers, and it can be very difficult to avoid making breaking changes, which inevitably leads to a fear of any change at all. Avoid at (nearly) all costs.

The probably best way to get out of this situation is to never get into it. So how did we get in this situation in the first place? I guess usually because humans go the path of least resistance. So the usual way goes like that: Change security group settings and connect directly to the database (in the same VPC and AWS Account). And BOOM: it became a shared resource and so it became a broken window.

I’d argue with separate AWS Accounts it’s harder to build an entangled mess. In the described case one would e.g. need to connect e.g. two VPCs from the different AWS accounts first. People might think twice if there is another way of accessing the data source in the other AWS account. E.g. by exposing it via an API. And even when they go for the VPC peering, they at least have to make that EXPLICIT on BOTH sides. It’s no drive-by change anymore.

Ownership and billing

Another advantage is the clarity of ownership when using multiple accounts. This can be enormously important in organizations which are in the transition from a classical dedicated ops team to a “[You built it, you run it.](You built it, you run it.)” model: If let’s say a dev team spawns a resource into their AWS account, it’s their resource. It’s their database, it’s their whatever. No throw-over-the-wall. They can move fast, they don’t have to mess around with or wait for other teams, but they are also more directly connected to the consequences of their actions. On the other hand, they also can do changes with less fear of breaking things in other contexts because of unknown side effects (remember the entangled database from above?).

It also makes billing really simple since costs are transparently mapped to the particular AWS accounts (Consolidated Billing), so you get a detailed bill per e.g. business function, environment or whatever you defined as dimensions for your AWS accounts. Again, a direct feedback loop. In contrast, think of a big messy AWS account with one huge bill. That might simply reinforce the still prevailing believe in many enterprises that IT is just a cost centre.

Side note: Yes you could also use Cost Allocation Tags for making ownership and costs transparent, but tagging has some limitations:

  1. Tagging is not consistent across AWS services and resources: Some support tags, some don’t.
  2. You need to force people to use tagging and/or build systems that check for correct tags etc. This process has to be maintained (e.g. initialized, communicated, trained, enforced, re-communicated, reinforced, and so on).

Right from the beginning

When I created my first corporate AWS account back in 2010, neither I nor my colleagues weren’t aware of all the multi-account advantages mentioned here. This was one contributing factor resulting in one big shared AWS account across teams. And believe me: “We’ll split up the account later, when we have more time / are earning money / are more people” is usually not going to happen! So please don’t make the same mistake! Create more AWS accounts!

My current favorite is to slice AWS accounts in two dimensions:

  • Dimension one: Business function/capability/product/project/Bounded Context (not teams/departments, see above!)
  • Dimension two: Environment (e.g.test, staging, prod)

This sounds like a lot of initial complexity, but I think it’s really worth it in the long term for the mentioned reasons.

Creating AWS Accounts is free and:

It’s getting easier with AWS Organizations

AWS Organizations does not only simplify the creation of new AWS accounts (it has been a pain in the ass before!), it also helps to govern who can do what: You can structure the AWS accounts you own into an organizational tree and apply policies to specific sub-trees. For example, you could deny the use of a particular service org-wide, for an organizational unit or a single account.

Outlook

In one of my next articles, I am going to bring some light into the drawbacks of having many AWS accounts, but also how to mitigate these drawbacks with automated account provisioning and governance, so stay tuned!

Thanks

I want to thank Deniz Adrian for reviewing this article and adding additional points about implicit resource sharing and fearless actions.

References

Like what you read?

You can hire me or make a donation via PayPal!

Paul O'Neill: The Irreducible Components of Leadership

This is an annotated transcription from Paul O’Neill’s talk on leadership - in my opinion the most powerful and inspiring talk I have ever seen on leadership. I decided to transcribe it (well, Youtube did the most work with its automatic subtitles feature), because there are so many great quotes in it and I wanted to have it as a source for myself, e.g. for futures articles, so I always have a written reference.

So here it is - if you find any errors, please do not hesitate to open a pull request:

I want to talk to you about leadership concepts this morning because I believe this: And I have now spent a lot of time working in a variety of ways in health and medical care and I choose to talk about the leadership component because I believe this: With leadership anything is possible and without it nothing is possible. So I’m going to define for you if I can what it is I mean by a leadership what is it that we should expect a leader to do. First of all I think it’s necessary for a real leader to articulate what I call unarguable goals and aspirations for the institution that they lead. That doesn’t mean that I think they should invent them themselves in a dark claws of middle of the night, but I believe it’s a really important critical role for a true leader to articulate non arguable goals.

So I want to tell you some non arguable goals. I want to start with my favorite thing: In a really great organization the people in it are never injured at work. Now when you head off in that direction one of the things you’ll find - I found when I first went to Alcoa - and I said on the first day I was there: People who work for Alcoa should never be hurt at work. There were a whole lot of people in Alcoa who didn’t say to my face but we’re saying in the hallways or behind me: He doesn’t know anything about making aluminum.”,, “He doesn’t know what he’s like to be in a smelter in Alcoa Tennessee in the summertime, where is a hundred and thirty degrees and the humidity is almost a hundred percent and people get heat prostration and there’s nothing you can do about it.”, “He doesn’t know, understand or appreciate any of that and so we’re pretty sure after he learned something about the business and we have our first downturn in metal prices he’ll shut up about safety because we’re already in the top one-third of all organizations in the United States in terms of our safety performance!”

So I’m here to tell you a leader who articulates non arguable goals is likely to get some arrows in the back. But it doesn’t mean you should stop. It really should renew your commitment to be out there in front articulating goals. Let me take it into health medical care and say to you I believe the same kind of goal is the right goal for a hospital-acquired infections. And let me be careful to say, it’s very difficult to actually get to zero injuries to a workforce or to zero nosocomial infections but I think it’s pretty hard for anyone to sustain an argument that says our goal should be some positive number because I - and I did this you know when I when I first came to Knoxville - and I said to people: “You know I’ve only been here three weeks but I hope the Alcoa tomtom network works as well as most informal communication systems do and you already know that I’ve said that Alcoa should be a place around the world, not just in the United States, around the world in 43 countries and 350 plants, that we should be a place where people are never hurt at work. And it can only happen if you will take personal accountability and responsibility for this, for yourself and for your associates, for us to get there and **if some of you - as I’ve been told by the supervisors - believe that we should not set a zero goal because it’s unlikely we can achieve it, I’d like for you to raise your hand if you want to volunteer to be hurt so we can reach the goal.**” There were no but there were no volunteers! And I guarantee you if you ask patience would it be okay if we gave you an infection because we’re not meeting our goal this month there would be no volunteers.

So a leader needs to articulate not arguable goals. And again it doesn’t mean that we know exactly how we’re going to get there but at least we’ve got every human factor in our organisation lined up and trying to achieve the targeted goal. This can’t be done you this cannot be a delegated function. You can’t have a person who’s the vice president for goals. The leader needs to articulate the goals. Other people do not have the power or the position to do that. Now after the goals have been articulated clearly, you notice I didn’t start by saying you know we’re going to make a hell of a lot of money, and let me just say parenthetically that’s because I believe in a truly great organization finance is not an objective it’s a consequence and it’s great if it’s the consequence of being more excellent at what you do then anyone else that does what you do. In my experience the finance follows excellence. So having a goal for economic for financial success is to be not not a good place to start. It doesn’t mean you don’t have to earn the cost of capital or cover your cost or any of that but it should not be an objective of the organization it should be a consequence of excellence.

So how do you move from this goals into action and organization? First of all I think it’s incredibly important to reach every person in the organization and again I’m talking about the theoretical limit in my experience they’re always, no matter how hard you work at it, there are three or four percent of the human factors in the organization that never get it and can’t get it, and you need to do something about that, but I found most people respond to a positive idea of leadership and organizational aspiration. Not many places really respond very well to negative motivation.

And so in an organization I believe that has the potential for greatness - doesn’t guarantee it - but had the potential for greatness the people in the organization can say every day without any reservation or hesitation ‘yes” to three questions. Here are the three questions:

I’m treated every day with dignity and respect by everyone I encounter without respect to my gender or my nationality or my race or my educational attainment or my rank or any other discriminating qualifier. Think about that for a minute. So it means when you go into the lobby of your enterprise every morning, the person behind the desk treats every person with the same happy face and welcoming greeting not related to whether you’re a surgeon that brings in 13% of the business or the person who cleans the room. In a truly great organization there is a seamless sense of “Everyone here is a court of dignity and respect every day”. Now I have a corollary for you which I practice at Alcoa: “If you’re not important you shouldn’t be here”. That raises a really difficult challenging proposition, if you think about it, because an awful lot of organizations, when times are tough, people are laid off. Not in a great organization, because at any particular time the people that are there are necessary or they wouldn’t be there. That creates a real challenge for leadership to figure out how to navigate ups and downs and economic cycles. And there is a way to do it by being clear in your own mind and in your own institution about the difference between a baseline of activity and fluctuating activities, so that you can negotiate with people who are going to be on the bubble if you will, that they understand that they’re on the bubble and they are there as casuals to take care of fluctuations. But for people are part of the baseline there needs to be an honored commitment, that you are really important or you wouldn’t be here, and we need you all the time. So the first proposition is I can say every day I’m treated with dignity and respect. Full-stop.

Second proposition is this: I’m given the things that I need - training, education, tools, encouragement, so that I can make a contribution - that’s important now - that gives meaning to my life. Think about, you know, if your work doesn’t give meaning to your life, it’s what you spend eight or ten or twelve hours a day doing then where are you going to get meaning in your life? On the golf course, or going out to dinner? You know, so I believe it’s an obligation of leadership to create a condition, so people can say I have all the things I need so I can make contribution that gives meaning to my life. Not a lot of places where people can say: this place gives meaning to my life

And third propositions pretty simple, it says: Everyday I can say, someone I care about and respect, noticed I did it. In a word its recognition regular - meaningful, sincere recognition. Now if you find a place, or you can create a place I would say this is a job of leaders - again this cannot be delegated to human resources - this is for a leader of an institution to establish the conditions on an ongoing basis so every person in the institution every day can say yes to these three propositions, then you have a potential for real greatness.

Now after after the leader has articulated the goals and created these cultural characteristics that are pro excellence, a leader needs to take away excuses. And in my experience the excuses are all the same across public, private and nonprofit. When you make these suggestions to people they say: “Well you don’t understand. We really can’t do this quality or continuous learning continuous improvement set of things because we’re already working two hours past what we get paid for. We’re too busy to take on something new”. And people say “We’re too busy!”, and they will say: “If we’re going to do this we need more people. We need to hire some people who are experts in continuous learning and continuous improvement and quality and we need to set up a new department”, and people will always say “We don’t have enough money, were already struggling, so we need more money!” I believe it’s the leader’s responsibility take away all of the excuses.

So I want to give you an example of taking away excuses. I was telling this to ? last night. She said, “you need to tell this story.” So when I first came to Tennessee to Knoxville to Alcoa Tennessee and I spent the morning walking through the plant because I like to feel the things that I am supposed to be responsible for so I wanted to see what it was like to be there for half a day and see what it smelled like and how the people were dressed and whether they had, you know, half of a finger - whatever as a consequence of being in this place. And so at noon they said, we’re we’re going to have lunch. And there were 75 people at lunch half of them were from the supervisory ranks and the other half were from the union organized workforce. So they said “Would you like to say something?”, I said, “Yes, I would.” So I got up and I said, “You know I want to talk to you about safety and I presume you’ve all heard this but here’s what I want to say to you: I want to say to the supervisors: I believe it’s the leader’s role to take away excuses so here’s what I’m saying to you: We will never ever in Alcoa again budget for safety. Never. We’re not going to have a budget line for safety. As soon as we identify anything, as soon as anyone in the institution identifies anything that could cause an individual to be hurt, we’re going to fix it right and we’re going to fix it as fast as it’s physically possible. And so I want to charge you and the supervisory ranks with acting on that idea. You need to actually do it. If something breaks down or you think something could hurt somebody, fix it right now. We’ll figure out how to pay for it later on. Just do it!“ And then I turn to the hourly workforce and I said to them, “You heard my instruction to them. Here’s what I want to do with you. I want to give you my home phone number so that if they don’t do what I said, you can call me!” Not many CEOs were giving their home phone number away, but I wanted I wanted the people to know that this was a real thing. In a few weeks I got a call late one afternoon from an operating guy from the floor in Alcoa Tennessee and he said, “You know, well I’m calling up because you told you told all of us, we should call you if the supervisors are not fixing things. Well we’ve had a roller conveyor system down here that’s been broken for three days or so, and as a consequence those of us who are the workforce have to pick up the 900-pound ingots, a bunch of us, and put them on a dolly and take them from one processing step to the next. And we’re going to get hurt doing this! Our backs are at risk at a minimum and if we dropped one of these things on our foot we’d be permanently disabled. So I want to know what are you going to do about it?” So I said, “You know, let me make a few phone calls.” So I called the supervisory people and explained to them that they were not doing what I told them was their obligation to the workforce. And you know, I had a couple of phone calls in the first six months I was at Alcoa, but fortunately the tomtom network at Alcoa really worked well and after I had to make a couple of interventions I didn’t have to make any more interventions.

You know, so part of what part of what I want to say to you is: Leadership is not about writings on the wall. It’s about acting in a noticeable way on the principles that you establish, so that people begin to believe that they are real that they’re not just writing on the wall. I would suggest that every organization that I know about that has an annual report says someplace early in the annual report, “Our human resources are our most important asset.” So in most places there’s no evidence that’s true, it’s just a sentiment. So we all say it yeah our human resource arm.. is your practice consistent with that? So it’s part of the reason that I elected the first day I went to Alcoa to articulate this goal of no one should ever be hurt at work because it’s measurable, right? You can tell whether or not somebody couldn’t come to work if they were hurt at work, because they aren’t there! right you can’t fudge the numbers. You can fudge numbers about recordable instance incidents and first-aid cases, but it’s pretty hard to lie about “Didn’t show up today”. That’s why I wanted a hard measure that we could look at every day and we could appreciate whether we were making progress or not.

So I want to tell you a little bit about how these numbers are done. In the 1987 the average number of cases of Americans in the workforce being hurt at work was five out of every 100 Americans. In 1987 had an incident at work that caused them to miss at least one day of work ,five out of 100. Alocas number was one 1.86. And if you want to know of what the number is today go on the internet, type Alcoa when you get the drop-down menu, go on environment health and safety and it’ll tell you 24 hours a day in 43 countries, in 350 locations what the injury, what the lost workday injury rate is on a running basis anytime you want to look at. Yesterday it was 0.116. Now why do I tell you that? Because the average lost workday rate in American hospitals is 3.0. And if you’re not good at math that’s 26 times worse than Alcoa. That’s unforgivable, because it’s within the capacity of leaders to articulate a zero goal and then to accomplish.

And I’m going to tell you a little bit about how do you accomplish it, because it’s not good enough, cheerleading won’t truly won’t do it, but this is really relevant to the quest for excellence in health and medical care. But I want to stay for a moment with injuries to the workforce, because the lessons about how to get close to zero in injury rates among the workforce are exactly, precisely the same things that are required to achieve startling excellence in the delivery of health medical care. So first of all you have to establish a process that says: Every incident that happens to one of our employees is going to be recorded within 24 hours, and it’s going to be put into cyberspace along with the surrounding circumstances. And where it’s possible to do it in 24 hours the root cause analysis and an indication of a corrective action that’s being taken, so that this set of circumstances will never again produce this result.

Now I will tell you a special piece about this. I believe that it’s really important in our world to keep things personal. And so when I started this Alcoa. and I said, “Not only do I want to identify this case, I want to do it by name”. My lawyers didn’t like that because they said, “You’re going to create a feast for the tork bar to come in here and sue the hell out of us because we’re now going to put in cyberspace for anybody to look at individuals by name and what happened to them.” You know, what lawyer could find a better way to produce cases. Okay and I said, “I don’t think you’re right and I’m going to take the personal responsibility if we do get sued, because it’s so important that we not let this be statistical.” It needs to be about “Every person is important and they’re important by name they’re not important as a statistic.” So what do you do when you create that, in a world that we live in now with this unbelievable connectivity, if you have an understanding and every one in the organization has signed on the wall, “I’m responsible for myself for not being hurt and for my mates not to be hurt”, when the message goes into cyberspace you can look at it with an expectation that - within the next 24 hours - 349 other locations around the world including Guinea and Russia and China India and Brazil and Argentina, 43 countries at all, that the people in those institutions will look at those cases and they will make whatever modifications are indicated, so we don’t have to learn this 350 times! That’s how you get close to zero, by continuous learning and continuous improvement from everything gone wrong.

It really works an unbelievable way, and I would suggest to you in health and medical care, it would be great if we could get leaders to sign up for the idea that there is a measurable way of knowing whether or not the people in the organization are truly the most important resource, by being able to tell, what kind of an injury rate exists among the people who deliver the care. So that I have to tell you I’m really skeptical of an organization that doesn’t know what its injury rate is. That they’re really good at hand hygiene, you know, because if you’re not really good at your worker’s own safety, at least for me there’s a doubt that you’re really good at the other things that we know are directly related to perfect patient care. And again I would suggest to you, the tools of learning and approach and engagement of the population are exactly the same in every kind of institution.

So that when I went to the Treasury, I tell you little story about a Larry Summers, who was my predecessor is a secretary of the Treasury under the Clinton administration. And so when we had our first briefing session where he was going to tell me about what he’s been doing toward the end of session I said to Larry, “Larry, what’s the lost workday rate at Treasury?”, and he said, “I don’t know what you’re talking about.”, which frankly was not a surprise to me.
And it took about three weeks to actually round up the data, and it turned out that the injury rate at the treasury - you may think you know how can anybody get hurt at the Treasury - well there are 125 thousand people there, and a significant fraction of them work in the mint. And if you went into the mint in 2001 and looked at the workers you’d find a lot of workers with a half little finger because the stamping machines at the end of their finger off is kind of a badge of experience. The injury rate of the Treasury was unbelievable. In 23 months we reduced the injury rate of the Treasury by 50%. In 23 months.

But I will tell you another story that’s related both to Alcoa and the Treasury to demonstrate another important principle to you. I believe that excellence at its best is habitual. And by habitual I mean it’s ingrained and inculcated in all the individuals in the institution so that it’s almost automatic. So it means it applies to everyone - again I can’t say enough about how important it is that if you’re really going to be on a quality quest - it needs to be about everyone in the institution. The people in the quality department cannot produce quality in an organization. It doesn’t mean they don’t have an important responsibility, but they cannot do it. In the same way that infection control committees cannot fix infections, right? They have an important role, but they cannot make it happen for the whole institution. In this quest to make sure that every one in the institution grasp these ideas, I called in the controller at Alcoa - this is about 1991 - and I said, “Ernie, I’d like to know, right now we’re closing our books in this worldwide enterprise in 11 days and reporting our results to Wall Street and I’d like to know, if we had a perfect process with no repair work, no transpositions of numbers, no foul-ups with computer programs that don’t integrate very well with each other for all these 350 locations, if we had no repair work and all of the time that we spent was high-value touch time, that means we’re actually producing value in every minute of every day, how long would it take?” About three weeks he came back to me and he said, “I’ve figured out the answer to your question and here it is: Right now we’re closing in books at 11 days. If we did it perfectly, we could do it in three days”. And I said, “You know, Ernie, that’s our new goal!”, and he said, “No that’s not what I meant. Oh my god, we can’t really do that! That’s just the answer to your question!”. I said, “Hey Ernie, we’re trying to be perfect at everything else we do, including workplace safety and manufacturing, and so the finance function needs to demonstrate to the rest of the organization what excellence really looks like.”
And it took us a year to get there. Now here the leadership functions is really important. I had to say to them, “**I don’t care how much it cost to make this perfect. I don’t care because I’m so confident that the value is there, and so here’s your permission. You can examine all of the things that we’re sucking up from around the world and decide whether the stuff that has evolved is really critical to a financial characterization of our organization and meeting our responsibilities to the Securities and Exchange Commission, so you have freedom to redefine what it is we do. You have the resources to rewrite the computer programs so that they’re friendly to human beings instead of only people who are nerds, who delight in complexity, so you can make this, so that it works for the people who have to do the process of financial roll-up. And if you need some outside help go and get it!**”

So a leader needs to provide a running room for people to work toward the theoretical limit. And in a year we got the point where we could close our books in three days. Full stop. And today, if you look at the quarterly earning report process, you look at CNBC or any of the other financial channels, Alcoa is still and probably always will be the first major corporation to roll up as earnings and report them good and bad, because the process works now. Think about the implication of that, I want you to transfer it to health and medical care. In Alcoa at that time we had 1,300 people in the finance function, and by going from 11 days to 3 days we freed up 8 days a quarter for 13 of the most highly trained analytic people in the organization. Not so we could fire them, but so that they could use their brain power to help us better understand, how to improve everything else we were doing. This is not about firing people. It’s about creating the opportunity for applying resources in a way that produces ever greater value.

So when I went to the Treasury I said to them, “How long’s that take us to close our books after the end of the fiscal year of September 30th”, and they said, “Well, we usually get it done by March.” and I said, “I don’t know why you even bother! Who the hell wants to know what the numbers were five months after the fact?” So I said them, “You know, I know an organization that’s more complicated than the Treasury where they closed the books in three days, and that should be our goal. At the Treasury we should be at least as good as Alcoa. And so they said - you know, again the excuse for routine, “We don’t have the money. We’re already too busy”. And then they hit me with a new one: ”Government laws and regulations won’t permit it.” Smart guy, so I said, “I tell you what if you can show me a rule or regulation or a law, that prohibits us from doing this I will go and get it changed.” Again it was taking away the excuses. “Give it to me. If you tell me there are barriers that need to be rolled away, I will roll over the barriers. There weren’t any. It was just an excuse. Nobody had really examined, “How the hell can we do this? We will do it!” And so in 13 months at the Treasury we we figured out how to close the books in three days. If you want to see this story, it’s on the Treasury website. They’re so proud of it. My name isn’t there but it happened on my watch, because I got my controller from Alcoa to come pro bono. We didn’t pay them a dime to come pro bono to coach the people at the Treasury how to do this job.

And again the reason I tell you this is, because I know an awful lot of health care institutions that don’t close their books in three days, but they could if the leadership decided this is a value and a way to demonstrate the organization that every part of our institution is on the same wavelength, and we’re all about excellence and we don’t and we won’t live in silos and we won’t embrace excuses, and we will be excellent at everything that we do. And I would tell you more stories about health medical care, but I’ve used up my time and I hope I have challenged you a little bit, maybe inspired you a little bit about the potential for what you as leaders in health and medical care can do because I will tell you just one more thing: I believe there is no other sector of our society and our economy that has the same potential for simultaneously improving outcomes from medical intervention and reducing the cost by a trillion dollars a year.

Further reading

Like what you read?

You can hire me or make a donation via PayPal!

Dead man's switch with AWS CloudWatch: Freshness-Alerting for Backups and Co

A recent challenge for one of the teams I am currently involved was to find a way in AWS CloudWatch:

  1. To alert if the metric breaches a specified threshold.
  2. To alert if a particular metric has not been sent to CloudWatch within a specified interval.

While the first one is pretty much standard CloudWatch functionality, the latter is a bit more tricky. In the Nagios/Icinga world it’s called “freshness”. You could also call it special case of a “Dead man’s switch“ for periodic tasks / cronjobs.

So for example in our case we wanted to have monitored and alerted whether a backup job runs once per day.

So here is what we did (CloudFormation snippet below):

  • Set the check period to the interval during which the metric is supposed to be sent. E.g. 86400 if the metric should is supposed to be sent every day. This instructs CloudWatch to check once per day.
  • Set evaluation periods to 1: We want to get alerted immediately when there is no data written or the threshold has been breached.
  • And now the important one: We have to treat missing data as breaching, so that,if there has been no entry within the evaluation period then the alarm gets triggered.

Example in CloudFormation syntax:

1
2
3
4
5
6
7
HealthCheckAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
Period: 86400
EvaluationPeriods: 1
TreatMissingData: breaching
...

References

Like what you read?

You can hire me or make a donation via PayPal!

Lesegruppen / Buchclubs in Unternehmen

Buchbesprechungen als Werkzeug für Organisationales Lernen

Auf den devopsdays 2015 in Berlin hatten wir eine Open Space Gruppe, die sich mit dem Thema beschäftigte: “Ich lese viele Bücher, Blogs etc, und würde dieses Wissen gerne in das Unternehmen tragen, in dem ich gerade arbeite”. Dabei kam die Idee auf, Lesegruppen zu bilden. Diese Idee fand ich so gut, dass ich sie direkt einmal ausprobiert habe.

Seitdem habe ich mehrere Buchlesegruppen in Unternehmen (mit-)gegründet und diese “Buchclubs” als als sehr effektives Werkzeug des strukturierten Organisationales Lernens schätzen gelernt.

Was ist ein Buchclub / eine Lesegruppe?

Ein Buchclub ist eine Gruppe von Leuten, die ein Buch durcharbeiten und dann (z. B.) kapitelweise in regelmäßigen Treffen besprechen. Das eigentliche Lesen geschieht vor den Treffen durch jedes Mitglied in Eigenverantwortung. Als Bücher kommen Fachbücher in Frage, die einen Bezug zum Unternehmen haben.

Vorteile von Buchclubs - was bringt uns das?

Buchclubs bringen eine Reihe von positiven Eigenschaften:

  • Besprechung in der Gruppe: Das Themengebiet des Buches wird in der Gruppe besprochen und bearbeitet. Dabei entstehen für das Unternehmen sehr wertvolle (wenn nicht die wertvollsten) Diskussionen. (Zumeist) neue Konzepte werden zuerst von jeder teilnehmenden Person einzeln gelesen und danach in der Gruppe besprochen.
    Es sind also nicht die Ideen einer einzelnen Person, sondern jedes Gruppenmitglied hat das Thema selbst durch Lesen erarbeitet. Menschen neigen dazu, “selbst aufgenommene” Informationen besser aufzunehmen und in ihre mentalen Modelle einzuarbeiten (die ggf. stark zu dem Gelesenen divergieren), als wenn z.B. eine Einzelperson versucht, neue Ideen in eine Gruppe zu bringen.
  • Angleichen von unterschiedlichen mentalen Modellen: Wie wir die Wirklichkeit sehen, ist immer nur ein Ausschnitt. Es kann zum Beispiel sein, dass völlig unterschiedliche Arbeitsweisen oder Begriffsdefinitionen vorliegen innerhalb der Gruppe. Werden diese im Buch angesprochen, so entstehen oft Diskussionen wie “Ach, ihr macht das so?”, oder “Ach jetzt verstehe ich, was ihr mit X meint!”, aber auch “Dann lass uns doch auf X einigen, ich spreche das einmal in meinem Team an!”. Das sind genau die richtigen Diskussionen, weil Missverständnisse und Unverständnisse untereinander sachlich(er) geklärt werden, und dadurch wahrscheinlich auch die Gruppenkohäsion im Unternehmen gestärkt wird.
  • Reflektion der eigenen Arbeit: Fachbücher bieten eine gute Grundlage, um einen Realitätsabgleich von aktuell vorherrschenden Arbeitsweisen oder -mustern zu machen. Natürlich steht in Büchern auch immer nicht die ganze Wahrheit, oder die vorgestellte Welt ist zu perfekt, aber trotzdem bietet die Literatur meistens eine gute Indikation, auf welchem Level eine Person oder eine Gruppe sich befindet. Aktuell zum Beispiel lesen wir das Buch “Site Reliability Engineering”, welches beschreibt, wie Google intern arbeitet - und daher sind einige Konzepte aus dem Buch auch erst anzuwenden, wenn man eine Größe wie Google erreicht hat. Im Großen und Ganzen sind die Konzepte aber übertragbar bzw. sie regen zumindest wertvolle Diskussionen an.
  • Direkte Anwendung von Gelerntem: Ich erinnere mich an einen Buchclub, in dem wir “Implementing Domain-Driven-Design” durchgearbeitet haben, und Leute aus unterschiedlichen Teams dabei waren. Herausgekommen sind sehr konstruktive Diskussionen über die Gesamtarchitektur der Software, die das Unternehmen entwickelt, und zwar dieses mal geleitet von der Theorie aus dem Buch und nicht von unterschiedlichen mentalen Modellen oder Wissensständen (was für mich gefühlt in vorherigen Meetings immer der Fall war).
    Ein weiteres Beispiel war ein Buchclub, in dem wir “Toyota Kata” besprochen haben, und dann angefangen haben, eine Value-Stream-Map für das gesamte Unternehmen aufzustellen. Das war ein spannender Einblick in andere Unternehmensbereiche und ich habe gemerkt, wieviel Spaß es macht, mit der Gruppe erst die Theorie zu besprechen und dann darüber zu philosophieren, wo unser Unternehmen eigentlich wertschöpfend ist - so qualitativ hochwertige Diskussionen habe ich selten erlebt.
  • Gruppendruck: Wie so viele Dinge, die wichtig, aber nicht dringend sind, geht häufig auch das disziplinierte Zu-Ende-Lesen von Büchern im Alltag unter. Hier kann der Druck der Gruppe helfen: Wenn morgen der nächste Besprechungstermin ist, dann steht man mitunter schon einmal eine Stunde früher auf, um das Kapitel durchzulesen.
  • Günstig: Es gibt viele Wege der Mitarbeiter- und Teamentwicklung: Workshops, Schulungen, Konferenzen etc. - diese sind oft sehr teuer. Und der Erfolg ist ggf. auch noch fragwürdig: Meistens holt uns nach einer Schulung oder Konferenz schnell wieder das Tagesgeschäft ein - und die frischen Ideen sowie der Elan verpuffen. Die direkten Kosten von Buchclubs beschränken sich normalerweise auf Lese- und Besprechungszeit sowie die Anschaffung des Buchs.

Wie fange ich an?

In der Gruppe

Thema, Buch und Lesegruppe finden

Zuerst muss “jemand” ein Buch vorschlagen und dann dafür eine Lesegruppe finden. Meistens wird diese Person dann auch direkt Organisator der Gruppe. Das Finden von Mitgliedern kann z. B. durch Vorstellung des Buchclub-Konzepts in Meetings oder einfach an der Kaffeemaschine passieren. Wichtig ist, dass es immer ein freiwilliges Angebot ist.

Hier hilft es auch, dass man selbst “Schwäche zeigt”, z. B. “Ich würde mich gerne in Thema XYZ einarbeiten, da ich auf dem Gebiet Wissenslücken habe. Dazu habe ich Buch XYZ gefunden und würde dies gerne mit mehreren Leuten besprechen können, um sicherzustellen, dass ich die Konzepte wirklich verstanden habe”. Durch dieses “Verletzlich machen” (Zugeben, dass man Wissenslücken hat) erhöhen sich die Chancen, dass man mehr Menschen zum Mitmachen bewegen kann: Entweder merken sie, dass es nicht schlimm ist, mit Unwissen in diese Gruppe zu gehen, und dass es nicht darum geht, einzelne Mitarbeiter blosszustellen - oder sie sind schon versiert auf dem Thema und können dann in den Kapitel-Besprechungen ihr bestehendes Wissen gezielt einstreuen.

Lesegruppen-Setup

Hat man seine Lesegruppe gefunden, kann es losgehen! Zuerst sollte ein regelmäßiger Termin, z.B. jede Woche eine Stunde, festgelegt werden. Es ist auch hilfreich, direkt eine Mailingliste oder einen Chat (je nachdem, was im Unternehmen da ist) einzurichten und die Mitglieder einzuladen. Diese können dann für Ankündigungen genutzt werden.

Als nächstes einigt sich die Gruppe darauf, dass jedes einzelne Mitglied bis zum ersten Termin ein oder mehrere Kapitel durchgelesen hat. Als Empfehlung für den ersten Termin: Lieber mit dem ersten Kapitel oder der Einleitung starten, also nicht zuviel auf einmal am Anfang, denn beim ersten Treffen gibt es bestimmt viel auszutauschen.

Im Treffen geht die Gruppe dann das Kapitel durch, z. B. werden markierte Stellen besprochen. Weiterhin lohnt es sich oft, auch hinter die Referenzen und Fußnoten zu schauen. Es gibt auch die Möglichkeit, dass ein_e Moderator_in für das Treffen ausgemacht wird, diese_r dann das Kapitel vorstellt und hindurch leitet. Mit den entstehenden Diskussionen (siehe oben) ist die Zeit dann meistens auch schneller herum als erwartet.

Abschluss des Buches

Ist das Buch zuende gelesen, kann sich die Gruppe entweder auflösen oder direkt ein weiteres Buch finden und somit bestehen bleiben. Bleibt die Gruppe bestehen, ist es aber hilfreich, weitere potentielle Mitglieder in die Gruppe aufzunehmen, um die Diversität zu erhöhen. Denn auch Buchclubs sind nicht vor Gruppendynamiken wie Groupthink gefeit. Man muss auch aufpassen, dass sich keine In- und Out-Gruppen bilden (z. B. Mitglieder des Buchclubs, die sich dann “elitärer” fühlen als Nichtmitglieder).

Als Unternehmen

  • Arbeitszeit explizit freigeben: Als Zeichen, dass das Unternehmen die Weiterbildung seiner Mitarbeiter_innen unterstützt, sollte ein gewisser Anteil der Arbeitszeit “freigegeben” werden für explizite Weiterbildungsmaßnahmen wie z. B. Buchclubs. Ein Beispiel für ein Modell wären z. B. die Besprechungszeit übernimmt das Unternehmen, die Lesezeit die Mitarbeiter_innen.
  • Teamleiter_in mit einbinden: Häufig kommen in Buchbesprechungen dann “Man müsste mal” Themen auf. Hilfreich ist es hier immer, wenn Teamleiter_innen direkt mit dabei sind, so dass Änderungen an Arbeitsprozessen oder Experimente schneller umgesetzt werden können. Häufig werden dann auch Dinge angesprochen, die sonst im Alltag untergehen würden.

Zusammenfassung

Buchclubs bieten eine für Unternehmen sehr kosteneffektive Möglichkeit, die Weiterbildung seiner Mitarbeiter_innen zu fördern. Weiterhin sind sie ein Werkzeug für organisationales Lernen.

Like what you read?

You can hire me or make a donation via PayPal!

"Service Discovery" with AWS Elastic Beanstalk and CloudFormation

How to dynamically pass environment variables to Elastic Beanstalk.

Elastic Beanstalk is a great AWS service for managed application hosting. For me personally, it’s the Heroku of AWS: Developers can concentrate on developing their application while AWS takes care of all the heavy lifting of scaling, deployment, runtime updates, monitoring, logging etcpp.

But running applications usually means not only using plain application servers the code runs on, but also databases, caches and so on. And AWS offers many services like ElastiCache or RDS for databases, which should usually preferred in order to have lower maintenance overhead.

So, how do you connect Elastic Beanstalk and other AWS services? For example, your application needs to know the database endpoint of an RDS database in order to use it.

“Well, create the RDS via the AWS console, copy the endpoint and pass it as an environment variable to Elastic Beanstalk”, some might say.

Others might say: Please don’t hardcode such data like endpoint host names, use a service discovery framework, or DNS and use that to look up the name.

Yes, manually clicking services in the AWS console and hardcoding configuration is usually a bad thing(tm), because it violates “Infrastructure as Code”: Manual processes are error-prone, and you’ll loose documentation through codification, traceability and reproducibility of the setup.

But using DNS or any other service discovery for a relatively simple setup? Looks like a oversized solution for me, especially if the main driver for Elastic Beanstalk was the reduction of maintenance burden and complexity.

The solution: CloudFormation

Luckily, there is a simple solution to that problem: CloudFormation. With CloudFormation, we can describe our Elastic Beanstalk application and the other AWS resources it consumes in one template. We can also inject e.g. endpoints of those AWS resources created to the Elastic Beanstalk environment.

Let’s look at a sample CloudFormation template - step by step (I assume you are familiar with CloudFormation and Elastic Beanstalk itself).

First, let’s describe an Elastic Beanstalk application with one environment:

1
2
3
4
5
6
7
8
9
10
11
...
Resources:
Application:
Type: AWS::ElasticBeanstalk::Application
Properties:
Description: !Ref ApplicationDescription
ApplicationEnv:
Type: AWS::ElasticBeanstalk::Environment
Properties:
ApplicationName: !Ref Application
SolutionStackName: 64bit Amazon Linux 2016.09 v2.5.2 running Docker 1.12.6

Ok, nothing special so far, let’s add a RDS database:

1
2
3
4
DB:
Type: AWS::RDS::DBInstance
Properties:
...

CloudFormation allows it to get the endpoint of the database with the GetAtt function. To get the endpoint of the DB database, the following code can be used:

1
!GetAtt DB.Endpoint.Address

And CloudFormation can also pass environment variables to Elastic Beanstalk environments, so let’s combine those two capabilities:

1
2
3
4
5
6
7
8
9
10
ApplicationEnv:
Type: AWS::ElasticBeanstalk::Environment
Properties:
ApplicationName: !Ref Application
...
OptionSettings:
- Namespace: aws:elasticbeanstalk:application:environment
OptionName: DATABASE_HOST
Value: !GetAtt DB.Endpoint.Address

Et voila, the database endpoint hostname is now passed as an environment variable (DATABASE_HOST) to the Elastic Beanstalk environment.
You can add as many environment variables as you like. They are even updated if you change their value (Cloudformation would trigger an Elastic Beanstalk enviroment update is this case).

Like what you read?

You can hire me or make a donation via PayPal!