devopsdays Ghent recap

!!! ATTENTION: Highly unstructured braindump content !!!

Day 1

The Self-Steering Organization: From Cybernetics to DevOps and Beyond

Nice intro intro cybernetics and systems theory. Nothing really new for me as I’m into system theory a very little bit. Keywords: Auto autopoiesis, systems theory, cybernetics, empathy, feedback.

Ceci n’est pas #devops

  • “DevOps is culture, anyone who says differently is selling something. Tools are necessary but not sufficient. Talking about DevOps is not DevOps.”
  • fun experiment replace every occurrence of “DevOps” with “empathy” and see what happens ;-) reminded me of the “butt plugin”)

Cognitive Biases in Tech: Awareness of our own bugs in decision making

  • Talk is mainly backed by the book “Thinking, fast and slow”
  • Brain is divided in System 1 and System 2
  • System 2 gets tired easily: Do important decisions in the morning (e. g. monolith vs. micro-service), postpone trivial ones to the evening (e. g. what to cook for dinner)
  • great hints for better post mortems

5 years of metrics and monitoring

  • great recap on infoq
  • You really have to focus on how to visualize stuff. Looks there needs to be expertise for this in a company which wants to call itself “metrics driven” or “data driven”
  • We have to be aware of Alert fatuigues:
    • noise vs. signal
    • not reacting to alerts anymore, because “they will self-heal anyway in a few minutes” (we call this “troll-alert” internally, which is a very bad description for an alert coming from a non-human system which is apparently not able to troll)

Ignites

Repository as an deployment artifact - Inny So

  • talking about apt.ly - application+environment as atomic release tags

Day 2

Running a fully self-organizing/self-managing team (or company)

  • good recap at infoq
  • interesting open allocation work model, but with managers, feedback loops, retrospectives and planning meetings. They call it “self-selection”
  • it’s sometimes better to let people go instead of trying to keep them
  • people need explicit constraints to work in, otherwise they don’t know their and others boundaries

[Automation with humans in mind: making complex

systems predictable, reliable and humane](https://gist.github.com/s0enke/0ac2f6a0cce307d9cddc)

Open spaces

Internal PaaS

I hosted a session on “Why/How to build an internal PaaS”. The reason for doing this is building a foundation for (micro-)services: Feature Teams should be able to easily deploy new services (time to market <1hour). They should not care about building their own infrastructure for: Deployment of appliations of different languages (PHP, Ruby, Java, Python, Go …), metrics, monitoring, databases, caches etcpp.

So I had a quick look, e.g. at flynn.io or deis.io, which pretend to do what I want, and I hoped someone actually using stuff like that might be right here.

The session itself was a bit clumsy: I guess I couldn’t explain my problem well enough, or it actually is no problem. Or the problem is too new as there was no one in the room who actually had more than 2 micro-services deployed.

But anyway, talking about what I want to achieve actually helped me to shape my thoughts.

Microservices - What is important for Ops
  • Session hosted by MBS
  • If a company wants to migrate to / implement microservices, an Ops team should insist on 12-factor-app style in order to have a standardization
  • Have a look at Simian Army which has been implemented to mitigate common bad practices in microservice architecture, e. g. make everyone aware of fallacies of distributed computing.
  • Not really for Ops, but for devs:
    • EBI / Hexagonal programming style from beginning on, so it doesnt matter (in theory) if monolithic or service-oriented. In theory easy to switch
    • Jeff Bezos Rules
    • Generally having a look at Domain Driven Design and orienting (e. g. using repository and entities instead of ActiveRecord)

All the videos

on ustream

Other Recaps

External MySQL slaves with RDS reloaded

In an earlier first post I demonstrated a way to connect an external slave to a running RDS instance. Later then AWS added the native possibility to import and export via replication.

In my case, several problems popped up:

  • My initial blog post did not show how to start from an existing data set, e. g. do a mysqldump, and import
  • RDS does not allow –master-data mysqldumps as “FLUSH TABLES WITH READ LOCK” is forbidden in RDS
  • So we do not have any chance to get the exact starting point for reading the binlog on the slave.

The RDS documentation documents an export of data, but not a longer lasting softmigration. For me it’s critical to have a working replication to on-premise MySQL over several months, not only a few hours to export my data. Actually we are migrating into AWS and have to connect our old replication chain to the AWS RDS master.

Another point: The RDS documentation is even unclear and buggy. For example it states

Run the MySQL SHOW SLAVE STATUS statement against the MySQL instance running external to Amazon RDS, and note the master_host, master_port, master_log_file, and read_master_log_pos values.

But then

Specify the master_host, master_port, master_log_file, and read_master_log_pos values you got from the Mysql SHOW SLAVE STATUS statement you ran on the RDS read replica.

Ok, to which master shall I connect? The MySQL instance outside of RDS should not have any master host data set yet, because it’s a fresh instance? The master host on the read replica is a private RDS network address, so we could never connect to that from our VPC.

Next point: RDS lets us set a binlog retention time, which is NULL by default. That means binlogs are purged as fast as possible. We had the following case with an external connected slave: The slave disconnected because of some network problem and could not reconnect for some hours. In the meantime the RDS master already purged the binary logs and thus the slave could not replicate anymore:

Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

So I was forced to find a solution to setup a fresh external slave from an existing RDS master. And without any downtime of the master because it’s mission critical!

First I started to contemplate a plan with master downtime in order to exercise the simple case first.

Here is the plan:

  1. Set binlog rentition on the RDS master to a high value so you are armed against potential network failures:

    > call mysql.rds_set_configuration('binlog retention hours', 24*14);  
    Query OK, 0 rows affected (0.10 sec)  
    
    > call mysql.rds_show_configuration;  
    +------------------------+-------+------------------------------------------------------------------------------------------------------+  
    | name                   | value | description                                                                                          |  
    +------------------------+-------+------------------------------------------------------------------------------------------------------+  
    | binlog retention hours | 336   | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |  
    +------------------------+-------+------------------------------------------------------------------------------------------------------+  
    1 row in set (0.14 sec)  
    
  2. Deny all application access to the RDS database so no new writes can happen and the binlog position stays the same. Do that by removing inbound port 3306 access rules (except your admin connection) from the security groups attached to your RDS instance. Write them down because you have to re-add them later. At this time your master is “offline”.

  3. Get the current binlog file and position from the master, do it at least 2 times and wait some seconds inbetween in order to validate it does not change anymore. Also check SHOW PROCESSLIST whether you and rdsadmin are the only connected users against the RDS master.
  4. Get a mysqldump (without locking which is forbidden by RDS, as stated above):

    $ mysqldump -h <read replica endpoint> -u <user> -p<password> --single-transaction --routines --triggers --databases <list of databases> | gzip > mydump.sql.gz
    
  5. rsync/scp to slave

  6. call STOP SLAVE on your broken or new external slave
  7. Import dump
  8. Set binlog position on the external slave (I assume the remaining slave settings, e. g. credentials, are already set up).

    CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin-changelog.021761', MASTER_LOG_POS=120
    
  9. Re-add RDS master ingress security rules (or at least add the inbound security rule which allows the external slave to connect to the RDS master).

  10. Start external slave. The slave should now catch up with the RDS master.
  11. Re-add remaining RDS master security group ingress rules if any.

Ok, now we know how to do it with downtime. This might be OK for testing and staging environments, but not for production databases.

How can we do it without downtime of the RDS master?

The AWS manual says we should create a RDS read replica and mysqldump the read replica instead of the master, but it is unclear and buggy about how to obtain the master binlog position.

But using a read replica is actually the first correct step.

So here is my alternative plan:

Spin up a read replica, stop the replication manually.

> CALL mysql.rds_stop_replication;  
+---------------------------+  
| Message                   |  
+---------------------------+  
| Slave is down or disabled |  
+---------------------------+  
1 row in set (1.10 sec)

Now we can see which master binlog position the slave currently is at via the Exec_Master_Log_Pos variable. This is the pointer to the logfile of the RDS master and thus we now know the the exact position from where to start after setting up our new external slave. The second value we need to know is the binlog file name, this is Relay_Master_Log_File - for example:

Relay_Master_Log_File: mysql-bin-changelog.022019  
  Exec_Master_Log_Pos: 422

As the mysql documentation states:

The position in the current master binary log file to which the SQL thread has read and executed, marking the start of the next transaction or event to be processed. You can use this value with
the CHANGE MASTER TO statement’s MASTER_LOG_POS option when starting a new slave from an existing slave, so that the new slave reads from this point. The coordinates given by
(Relay_Master_Log_File, Exec_Master_Log_Pos) in the master’s binary log correspond to the coordinates given by (Relay_Log_File, Relay_Log_Pos) in the relay log.

Now we got the 2 values we need and we have consistent state to create a dump because the read replica stopped replication.

$ mysqldump -h <read replica endpoint> -u <user> -p<password> --single-transaction --routines --triggers --databases <list of databases> | gzip > mydump.sql.gz

Now follow the steps 5-8 and 10 from above.

You should have a running external read slave which is connected to the RDS master by now. You may delete the RDS read replica again as well.

Happy replicating!

Replicating AWS RDS MySQL databases to external slaves

Update: Using an external slave with an RDS master is now possible as well as RDS as a slave with an external master

Connecting external MySQL slaves to AWS RDS mysql instances is one of the most wanted features, for example to have migration strategies into and out of RDS or to support strange replication chains for legacy apps. Listening to binlog updates is also a great way to update search indexes or to invalidate caches.

As of now it is possible to access binary logs from outside RDS with the release of MySQL 5.6 in RDS. What amazon does not mention is the possibility to connect external slaves to RDS.

Here is the proof of concept (details on how to set up a master/slave setup is not the focus here :-) )

First, we create a new database in RDS somehow like this:

soenke♥kellerautomat:~$ rds-create-db-instance soenketest --backup-retention-period 1 --db-name testing --db-security-groups soenketesting --db-instance-class db.m1.small --engine mysql --engine-version 5.6.12 --master-user-password testing123 --master-username root --allocated-storage 5 --region us-east-1 
DBINSTANCE  soenketest  db.m1.small  mysql  5  root  creating  1  ****  n  5.6.12  general-public-license
      SECGROUP  soenketesting  active
      PARAMGRP  default.mysql5.6  in-sync
      OPTIONGROUP  default:mysql-5-6  in-sync  

So first lets check if binlogs are enabled on the newly created RDS database:

master-mysql> show variables like 'log_bin';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| log_bin       | ON    |
+---------------+-------+
1 row in set (0.12 sec)

master-mysql> show master status;
+----------------------------+----------+--------------+------------------+-------------------+
| File                       | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+----------------------------+----------+--------------+------------------+-------------------+
| mysql-bin-changelog.000060 |      120 |              |                  |                   |
+----------------------------+----------+--------------+------------------+-------------------+
1 row in set (0.12 sec)

Great! Lets have another check with the mysqlbinlog tool as stated in the RDS docs.

But first we have to create a user on the RDS instance which will be used by the connecting slave.

master-mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'slavepass';
Query OK, 0 rows affected (0.13 sec)

master-mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
Query OK, 0 rows affected (0.12 sec)

Now lets have a look at the binlog:

soenke♥kellerautomat:~$ mysqlbinlog -h soenketest.something.us-east-1.rds.amazonaws.com -u repl -pslavepass --read-from-remote-server -t mysql-bin-changelog.000060
...
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
CREATE USER 'repl'@'%' IDENTIFIED BY PASSWORD '*809534247D21AC735802078139D8A854F45C31F3'
/*!*/;
# at 582
#130706 20:12:02 server id 933302652  end_log_pos 705 CRC32 0xc2729566  Query   thread_id=66    exec_time=0     error_code=0
SET TIMESTAMP=1373134322/*!*/;
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%'
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

As we can see, even the grants have been written to the RDS binlog. Great! Now lets try to connect a real slave! Just set up a vanilla mysql server somewhere (local, vagrant, whatever) and assign a server-id to the slave. RDS uses some (apparently) random server-ids like 1517654908 or 933302652 so I currently don’t know how to be sure there are no conflicts with external slaves. Might be one of the reasons AWS doesn’t publish the fact that slave connects actually got possible.

After setting the server-id and optionally a database to replicate:

server-id       =  12345678
replicate-do-db=soenketesting

lets restart the slave DB and try to connect it to the master:

slave-mysql> change master to master_host='soenketest.something.us-east-1.rds.amazonaws.com', master_password='slavepass', master_user='repl', master_log_file='mysql-bin-changelog.000067', master_log_pos=0;
Query OK, 0 rows affected, 2 warnings (0.07 sec)

slave-mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

And BAM, it’s replicating:

slave-mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: soenketest.something.us-east-1.rds.amazonaws.com
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin-changelog.000068
          Read_Master_Log_Pos: 422
               Relay_Log_File: mysqld-relay-bin.000004
                Relay_Log_Pos: 595
        Relay_Master_Log_File: mysql-bin-changelog.000068
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: soenketesting
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 422
              Relay_Log_Space: 826
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 933302652
                  Master_UUID: ec0eef96-a6e9-11e2-bdf0-0015174ecc8e
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

So lets issue some statements on the master:

master-mysql> create database soenketesting;
Query OK, 1 row affected (0.12 sec)
master-mysql> use soenketesting
Database changed
master-mysql> create table example (id int, data varchar(100));
Query OK, 0 rows affected (0.19 sec)

And it’s getting replicated:

slave-mysql> use soenketesting;
Database changed
slave-mysql> show create table example\G
*************************** 1. row ***************************
       Table: example
Create Table: CREATE TABLE `example` (
  `id` int(11) DEFAULT NULL,
  `data` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Diploma thesis

Disclaimer: German.

Auch Aaron Swartz zu Ehren - veröffentliche ich meine Diplomarbeit “Wissensmanagement als integraler Bestandteil des Software Engineerings - Konzeption einer Vorgehensweise unter Einbindung agiler Modelle am Beispiel des Web 2.0-Unternehmens Jimdo”.

In der Diplomarbeit wird untersucht, inwiefern die Anwendung agiler und Lean- Softwarentwicklungsmethoden wie Scrum, Software Kanban, Extreme programming und DevOps schon Wissensmanagement implizit “betreiben”.

Hierzu werden die einzelnen Methodenbausteine (z. B. “Daily Standup” oder “Pair programming”) vor dem Hintergrund des Wissensmanagement als Geschäftsprozess untersucht. Daraus wird ein Wissensmanagementmodell entwickelt, welches sich am Unternehmen Jimdo ausrichtet.

Vielen Dank an meine Frau Mila, Boris, Judith und Bracki, die fleißig und unermüdlich korrekturgelesen sowie wichtige inhaltliche Kritik geliefert haben!

Download der Diplomarbeit “Wissensmanagement als integraler Bestandteil des Software Engineerings - Konzeption einer Vorgehensweise unter Einbindung agiler Modelle am Beispiel des Web 2.0-Unternehmens Jimdo”

puppet-rspec debugging

While introducing rspec-puppet into a big and grown puppet codebase at Jimdo we needed to debug stuff and get more verbose output while writing the first tests. As the interwebs aren’t very chatty about the topic, here for all the distressed googlers:

Configure debug output and a console logger in your test (or helper or somewhere):

it "should do stuff" do
  Puppet::Util::Log.level = :debug
  Puppet::Util::Log.newdestination(:console)
  should ...
end

hth :)

devopsdays in Hamburg

the last weekend i attended to devopsdays in hamburg. first thanks to patrick debois and marcel wegermann for doing a really great job of conference organisation. and thanks to the sponsors making the location, food and beer possible ;)

DevOps is a relatively young movement of people that think developers, operations and also QA have to work together instead of
creating departments and isles. It’s about communication and automation of software delivery processes.

For example Jez Humble had a talk on “Continuous Delivery“ (the book is must read) which is the brother of Continuous Integration: If you integrate continuous why not deploy it continuously. Rule 1: DONE MEANS RELEASED

How does that fit with Scrum? Scrum says: DONE MEANS “PASSES ACCEPTANCE TESTS” and will be released “somewhen” (well, yes, this is my very personal problem with scrum ;) )

So there were talks about and even a real-world example introduction of Kanban which IMHO perfectly fits the continuous delivery process. Check out this nice Scrum vs. Kanban minibook. In kanban you can visualize your value chain - and in software development a value is commonly created at the time you deployed / released your changes so it’s being used by your customers / users - so follows RULE 2: “ready for blahblub” is waste.

So we want our changes deployed continuously and automatically, but we do not want to press the red button until we are sure that production will not be destroyed. so we need unit tests, functional tests and so on. this is what jez’s book is all about. i also heard about nice tools like Vagrant, which allows you to simulate production env on every developer’s laptop with VirtualBox and integrating it into your configration management. Or firing up some instances for integration / functional testing etc (very short and incomplete explanation, yeah i know)

Ah configuration management. Didn’t get any? How many servers to manage? Does not matter. RULE 3: USE CONFIG MANAGEMENT OR DIE (more or less rule zero). Use puppet or chef. those were the most mentioned tools at the conf.

As mentioned, Devops is also about communication and evangelism. There was some thinking of a devops manifesto like the agile one.

Just another rule: I had some small discussion with Jez about feature branches. Feature branches and cont. integration do not well fit together. Rather use branch by abstraction.

This are some just my very own impressions and learnings of this weekend, there was even much more stuff. Hopefully videos will come online in the near future :)

Thanks all people involved. I really appreciate the devops movement. Feels like coming from heaven at the right time.

rdiff-backup: the easiest way to backup (and restore)

Today I finally managed to set up an incremental backup for my workstation. What do we need? Well, nothing more than rdiff-backup, an opensource command line tool with all the powers you need.

it just backups. and your last backup is always available 1:1 at the destination (no strange storage formats etc., just dirs and files). Diffs and metadata are stored separately. So if you want a
backup that does its job, is plain, is easy to restore, has no unneccessary features and “just works” than rdiff-backup is the right tool for you.

Here a small bash script I write to accomplish the mission:

#!/bin/bash  
P=/home/soenke  
DESTINATION=mydestionationserver.local  
INCLUDE="  
$P/Documents  
$P/.gnupg  
"  

echo "$INCLUDE" | rdiff-backup --include-filelist-stdin --exclude $P/ $P/ $DESTINATION::/home/soenke/backup  

In my case “mydestionationserver.local” is a local mediacenter server running Ubuntu and a SSH server. INCLUDE has one backup src per line. P is the prefix dir, in my case my homedir. As you can
see, I’m using a whitelist of dirs/files to be backupped. If you want your full homedir get saved, just use:

rdiff-backup /path/to/src <server>::<destination-dir>

Just check the examples to learn more.

PHP: SortingIterator

I just had the need for an Iterator that can sort itself by a user defined callback.

My special use case is that the DirectoryIterator of PHP does not sort the file list so it’s pretty random. But my program logic relies on files being sorted by filename.

So here’s the little class:

class SortingIterator implements IteratorAggregate
{

        private $iterator = null;

        public function __construct(Traversable $iterator, $callback)
        {
                if (!is_callable($callback)) {
                        throw new InvalidArgumentException('Given callback is not callable!');
                }

                $array = iterator_to_array($iterator);
                usort($array, $callback);
                $this->iterator = new ArrayIterator($array);
        }


        public function getIterator()
        {
                return $this->iterator;
        }
}

Basically it uses the functionality of the function “iterator_to_array()” to convert the iterator into an array and then let the array getting sorted by usort() and the user-defined callback.

Then the sorted array is wrapped into an ArrayIterator so it can be used with all Iterator functions, decorators and whatever again.

Usage example:

function mysort($a, $b)
{
        return $a->getPathname() > $b->getPathname();
}

$it = new SortingIterator(new RecursiveIteratorIterator(new RecursiveDirectoryIterator('/home/soenke/muell')), 'mysort');

foreach ($it as $f) {
        echo $f->getPathname() . "\n";
}

Without sorting Iterator:

/home/soenke/tests/test/a.txt
/home/soenke/tests/test/b.txt
/home/soenke/tests/test/ä.txt
/home/soenke/tests/test/aaa.txt
/home/soenke/tests/test/abc.txt
/home/soenke/tests/test/az.txt

And with Sorting-Iterator:

soenke@turingmachine:~/tests$ php dirit.php
/home/soenke/tests/test/a.txt
/home/soenke/tests/test/aaa.txt
/home/soenke/tests/test/abc.txt
/home/soenke/tests/test/az.txt
/home/soenke/tests/test/b.txt
/home/soenke/tests/test/ä.txt