Everybody Needs Backups - A Lesson Learned The Hard Way

A big incident within my Docker Swarm and the solution I developed after it occurred.

Paul Knulst in Programming Oct 31, 2021 5 min read

The incident

This was a lesson I've learned the hard way some days ago. At the moment, I am working for a software company as a consultant. During my remote work for different companies, I log all my invested time in my personal time tracking software which is working really well. The best thing was that my time tracking software run on my local machine in a docker environment! This is cool!

Within the last weeks, I was working on Docker Swarm, setting up my personal Swarm Environment. After everything was finished I decided to move my time tracking software into the cluster. Because it was already a docker container I could easily back up the MySQL data and put everything into my Swarm. It runs smoothly. Everything was working!

After some days I decided to create another docker container within my swarm. So I wrote the docker-compose.yml and deployed it. I was working with my Swarm. Many services were added (Gitlab, Mailserver, Portainer, etc).

And then one day after I finished my work and wanted to log my working hours the time tracking software was not available.

So I started to research: What is going on? Why can't I access the site? I checked the Swarm. Every service was deployed correctly and was running. After that, I checked the logs…. Within the software, I saw a message which says that the DB isn’t there. But it should be there??

I exec in my MySQL container. And then I saw it. NO DB ANYMORE! What was going on? With docker stack ps timetracking I checked every container. After some research, I saw that one container was killed and another one was deployed. Normally that's not a real problem. BUT THEN … I saw that it was deployed to another node in my cluster. This was bad because I know that there was no volume on that node.

I was happy because I thought I could easily change it back. I opened the docker-compose.yml and added a constraint that the service should be deployed all the time on the same node:

deploy:      
  placement:        
    constraints:          
      - node.hostname == *****

I know this led to deployment on a specific node within my cluster. Now the database won’t be lost and it should use the old volume!

AND THEN AFTER A RESTART OF THE CONTAINER, THE WHOLE VOLUME WAS DESTROYED. IT COULD NOT BE OPENED ANYMORE.

Luckily I created a database backup some days earlier while I moved from my local docker env into the swarm. I was happy that I did not lose a whole month of working hours…

The Backup Script

After this incident, the only thing I want to do is to develop a simple backup functionality for my docker env. At first, I exec into every container and saved the DB just to be sure. The next step was really easy. I searched for docker backup functionality and I found some really interesting articles and tutorials about saving all docker volumes to AWS with encryption and.....

This was too much to do because I need a really simple solution. I only want to copy my databases from every docker container into a safe place. I decided to create a small script that can do this. And then the script will be executed every day.

After some hours of bash-script researching, I developed a simple file called full-db-backup.sh which I try to explain later

#!/bin/bash
containers=$(docker ps | grep 'mysql\|maria' | awk {'print $NF'})

for container in $containers
do
  containerStringParts=$(echo $container | tr "." "\n")

  for single in $containerStringParts
  do
          simpleName=$single
          break 1
  done

  timestamp=$(date +%Y-%m-%d_%H-%M-%S)
  docker exec $container sh -c 'exec mysqldump --all-databases -uroot -p"$MYSQL_ROOT_PASSWORD"' > /root/backups/$simpleName-$timestamp.sql
done

Line 1:
Adding #!/bin/bash as the first line of your script tells the OS to invoke the specified shell to execute the commands that follow in the script.

Line 2:
A variable called containers is created and is filled with the content of the command withing $( ). This command list all running docker processes and then forwards the results (through |) to grep which filters them based on two patterns (myql OR maria). These results will then be forwarded to awk which only prints the last thing from the earlier listed rows. As a result, there will be an array containing each container name.

Line 4–5:
Starts a for loop to iterate through every container name.

Line 6:
Creates an array from the container name split by .. Normally docker container names are created like this: service-name.replica.some-hash

Line 8–9:
Starts a for loop to iterate through every part of the string array

Line 10–11:
Saves the FIRST value (the service name) within the variable simpleName and then breaks the loop

Line 12:
For loop end parameter

Line 14:
Save the actual timestamp within a variable

Line 15:
Connects to the docker container named $container (part of $containers array) and execute myqsldump to save every database. As Password it uses an ENV variable ($MYSL_ROOT_PASSWORD) which is used for nearly every MySQL/MariaDB docker volume in my environment. After dumping the file is then saved within /root/backups with a name created from the container simpleName and the actual timestamp

Line 16: For loop end parameter

Executing The Backup

I tested the script and it works as expected. It creates every time a new file containing a backup of the whole database.

Executing the final Backup Script before creating a cron job — Photo by Greyson Joralemon on Unsplash

But then I had to execute the script in an interval so that updates within the databases will be saved too.

I did not want to be too complicated so I decided to use just a simple cronjob on every docker swarm node. It is not the best thing but it works. It just works and that is what I want at the moment.

I used crontab to create a new cronjob. With the following command, I opened the cronjob list on my machine.

$> crontab -e

I added a cron job that will be executed every day at 1 AM and it will just execute my developed script which creates the backups within my root folder.

0 1 * * * /bin/sh /root/cronjobs/full-db-backup.sh

At last, I copied the script to every node in my cluster and add the same cronjob to it.

Backup strategy is in place and used every day — Created with the Imgflip Meme Generator

Closing Notes

I know this is not the best solution but this fast approach is enough at the moment. Now I can start to learn something about ANSIBLE and how I can automate this script within a playbook. Also, I will create something which will upload my dumps to Amazon. And encrypt them before uploading. but now I have time for this because I'm SAFE!

Still, I hope you find this article helpful! If you also have an interesting backup strategy to share feel free to comment here. Happy Backupping!

Feel free to connect with me on Medium, LinkedIn, and Twitter.

☕

🙌 Support this content

If you like this content, please consider supporting me. You can share it on social media, buy me a coffee, or become a paid member. Any support helps.

See the contribute page for all (free or paid) ways to say thank you!

Thanks! 🥰