Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → actionjack → So You Want To Onboard A Devops Engineer

actionjack / So You Want To Onboard A Devops Engineer

Licence: cc-by-4.0

Guidance on how to make your environment easier to onboard for Web Ops Engineers, SRE's and DevOps Practitioners

Labels

devops sre culture

Projects that are alternatives of or similar to So You Want To Onboard A Devops Engineer

Devops Readme.md

What to Read to Learn More About DevOps

Stars: ✭ 398 (+68.64%)

Mutual labels: culture, devops, sre

Awesome Devops

A curated list of resources for Devops

Stars: ✭ 697 (+195.34%)

Mutual labels: culture, devops

Runbook

A framework for gradual system automation

Stars: ✭ 531 (+125%)

Mutual labels: devops, sre

Devops Exercises

Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions

Stars: ✭ 20,905 (+8758.05%)

Mutual labels: devops, sre

Howtheyaws

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world use Amazon Web Services (AWS)

Stars: ✭ 389 (+64.83%)

Mutual labels: devops, sre

Rundeck

Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts

Stars: ✭ 4,426 (+1775.42%)

Mutual labels: devops, sre

Kapo

Wrap any command in a status socket

Stars: ✭ 45 (-80.93%)

Mutual labels: devops, sre

Linuxbashshellscriptforops

Linux Bash Shell Script and Python Script For Ops and Devops

Stars: ✭ 298 (+26.27%)

Mutual labels: devops, sre

Cloudprober

An active monitoring software to detect failures before your customers do.

Stars: ✭ 1,269 (+437.71%)

Mutual labels: devops, sre

Agileops

The Agile Operations methodology

Stars: ✭ 125 (-47.03%)

Mutual labels: culture, devops

Sre Book In Audio

Google Site Reliability Engineering book converted in audio

Stars: ✭ 130 (-44.92%)

Mutual labels: devops, sre

Atlantis

Terraform Pull Request Automation

Stars: ✭ 4,236 (+1694.92%)

Mutual labels: devops, sre

Awesome Sre Tools

A curated list of Site Reliability and Production Engineering Tools

Stars: ✭ 186 (-21.19%)

Mutual labels: devops, sre

Howtheysre

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

Stars: ✭ 6,962 (+2850%)

Mutual labels: devops, sre

My Links

Knowledge seeks no man

Stars: ✭ 311 (+31.78%)

Mutual labels: devops, sre

Awesome Sre

A curated list of Site Reliability and Production Engineering resources.

Stars: ✭ 7,687 (+3157.2%)

Mutual labels: devops, sre

Sysadmin Reading List

A reading/viewing list for larval stage sysadmins and SREs

Stars: ✭ 240 (+1.69%)

Mutual labels: devops, sre

Provision

Digital Rebar Provision is a simple and powerful Golang executable that provides a complete API-driven DHCP/PXE/TFTP provisioning system.

Stars: ✭ 252 (+6.78%)

Mutual labels: devops, sre

Wheel Of Misfortune

A role-playing game for incident management training

Stars: ✭ 57 (-75.85%)

Mutual labels: devops, sre

Marmot

Marmot workflow execution engine

Stars: ✭ 174 (-26.27%)

Mutual labels: devops, sre

View All Similar Projects ➔

So you want to Onboard a DevOps Practitioner

Author: Martin Jackson - @actionjack

At the moment everyone seems to be so concerned with recruiting DevOps practitioners but I feel the process of on-boarding them and giving them the environment to succeed is still a hit and miss affair, especially in busy organisations.

Also nobody (at least nobody I know…) wants to work in a difficult environment:

Bad environments (and broken cultures) do not attract nor retain top talent. In fact it does the opposite.

“Suffering increases in proportion to knowledge of a better way.”

Jim Hickstein

Making it easy to get work done from day one

Simplify, simplify and after that simplify some more

“Everything should be made as simple as possible, but no simpler.”

Albert Einstein

Reduce the time spent learning environments by building them to be easy to understand, with a focus on a making it possible for every developer (new or old) to become effective in the shortest possible amount of time.

Here is some guidance on how to make your environment easier to onboard and keep the people working on them happy.

Basics

The raw basics

“The only way you can stay on top is to remember to touch bottom and get back to basics.”

Shane Black

Have internet access sorted out for new starts or let them know if there isn't any.
Locker access (if you supply lockers for hot-desk environments).
Let security know that they are coming.
Let people know if they are required to use their own equipment or are being supplied with specified equipment and what Operating System.
If you haven't already done so adopt some Group Chat software like Slack, Microsoft Teams or Rocket Chat this kind of software is beneficial to all and reduces pressure on key individuals because your questions go out to a group of people rather than target specific individuals who may be busy and under constant interruption.
- If you do the above try and implement some communications etiquette, for example when you answer someone create the answer in a thread so the questions, context, conversation and possibly solution are kept in the same place rather than being strewn throughout the chat history.
- Provide a High-level Environment overview so new starts know what they are working on and what technologies they need to get up to speed on.

Culture

Aim to create a culture of empathy and psychological safety

“It's possible for good people, in perversely designed systems, to casually perpetrate acts of great harm on strangers, sometimes without ever realising it.”

Ben Goldacre, Bad Pharma, p. xi

Embrace the standard of The Humble Learner, The Humble Learner accepts the limits of human capacity while seeking to grow their technical and empathetic skills
Do not create nor foster a Blame, Shame and Train culture where mistakes are handled by openly blaming and shaming the employee (and sometimes terminating their employment) and then train other employees using the incident as an example
- Instead recognise each failure for what it is, a lesson, identify what went wrong and how we can ensure it does not go wrong again (and no, this does not mean this is an excuse to produce lots more documentation😜)
Try to foster a culture of improvement, benchmark your organisation against some form of maturity model to identify the gaps and attempt to close them.
Introduce the new engineer(s) to the relevant people within the organisation
Remember not everyone may be as smart as you are, they may be missing
- Context / Situational awareness (how did we get from here to there?)
- Tribal Knowledge (This is where our ancestors bodies are buried)
- Cultural awareness (How we do things around here)
- Technical Expertise in that specific problem domain
- The local Taxonomy - concepts and language does vary from work place to work place. e.g. pre-approved changes and standard changes many not necessarily mean the same thing from job to job.
What are the Preferred practices or "Design Principles"?
Listen to their point of view. Bringing in a new person is a prime opportunity to find out where the code or process needs improvement.
Test your mentoring and on boarding process to flush out any shortfalls by getting the last person who joined to mentor the new joiner.
Make your documentation inclusive e.g. this document is parsed using alex in order to catch insensitive and inconsiderate writing.
Be wary of not overloading new starts with too much information. There is often quite a lot to learn (often more than you think), instead provide a set of useful links so people can research at their own pace.
Write code that takes into account how future maintainers will feel reading it, let your code be empathetic.

Have up to date Documentation

Make it easy to understand and do the things

“Stale documentation is not only misleading, it is positively harmful.”

Riona MacNamara (@rionam)

It's important to either have or do the following:

Regularly tidy your documentation, old documents should be removed, outdated ones updated, if you touch it then update it
- Consolidate your documentation, nothing is so disheartening as searching your Wiki for "Password Management Policy" and 40+ search results coming up 👎
Have a High-Level logical Architecture. E.g. ideally written in a Git friendly format:
- SVG diagrams in github so you can see the infrastructure changes over time
- Graphviz description language
- Graphvizo
An overview of the company’s infrastructure.
Systems integration points and their third party dependencies
A intranet/wiki or enterprise social network to Learn about different teams, key members with pictures. On day one, one can easily get overwhelmed with lots of new names and faces.
Have documentation for your alerts. If something is important enough to disturb the on-call person about, it's important enough to have a runbook entry about it. If you alert because foo queue is too long, there should be a runbook entry describing how to fix it.
- At one client I worked with we configured the monitoring system so the alerts themselves actually had a link to the relevant runbook entry 👍 👏
Create a Glossary of Terms [e.g. a Minipedia] for describing any organisation specific acronyms or terms
- Create an on-boarding wiki page (i.e. Confluence/Google Docs)
- 👍 For Open,online and easy to reach checklists
- One cool thing that I have seen recently are acronym decoder chatbots for slack that watch for team acronyms and explain them real-time in the chat room
Write your documentation as if it's going to be open to public scrutiny someday.
Have an easy to use and setup collection of shared resources e.g. bookmark file of URL links, .ssh/config files
If possible keep your documentation as close to the code as possible (possibly as Markdown) rather than referencing external resources like wikis or, use a static site generator this way you are more likely to have up to date documentation, since you get immediate feedback when you do a review of code changes rather than having to separately review a PR and a Wiki Page. Some options are:
- mkdocs,
- hugo,
- sphinx or
- Jekyll
If there are problems that you have to work around in your code then in the comments link to some sort of permanent record (e.g. a URL of a Jira story or ADR) for why, the following code comment caused me to do a lot of running around (The `git blame' gave me a commit that lead to a PR that had zero details in it, authored by someone who could not remember why they put that in the code.):
```
instance_type: m4.4xlarge # Larger than this currently causes issues on our AMIs…
```

what would have been more helpful would have been:

instance_type: m4.4xlarge # Larger than this type causes issues see REF-2019

Operations

Make it easy to get stuff done

“Complexity exacts a staggering tax on your humans. Good Ops engineers attempt to pay down that tax.”

Charity Majors

Have all relevant user accounts and access setup and ready
Create Operations Checklists for your key processes
Have your work structured so people can see what needs to be done i.e. Kanban board backlog or To Do lists
Provide information regarding the applications that are maintained by the team and how to do the operations for those applications
Have safe to deploy sample dummy applications that can be deployed safely to your infrastructure so new starts can learn how the deployment process works without fear of impacting key applications
Make it difficult to make mistakes e.g
- protected branches e.g. to prevent force pushes to master
- If you have code standards, don't just document them back them up with Automated Code standards triggered by CI checks or pre-commit hooks
- Avoiding committing secrets and credentials into git repositories
If you have Policies on how to handle certain tasks e.g. Doing Spikes document them and link to them in your stories. e.g. here's the link to how you handle spikes.
Ensure your naming conventions are consistent and make sense:
- If something is called build_X and it actually deploys_X then change the name to deploys_X if possible to reduce confusion and prevent information hiding,
- If your environment structure is env-productgroup-application then make sure the naming is consistent across all environments e.g.
  - Development-Acme-Bomb
  - Test-Acme-Bomb
  - PreProduction-Acme-Bomb
  - Production-Acme-Bomb
Nobody should be able to do something catastrophic to an environment unless they are determined on doing so i.e.
- Make doing the right thing easy to do by creating safety harnesses using build or scripting tools like the following list to do the most common tasks safety without the worry of screwing up:
  - Bash Scripts
  - Gradle
- If you use configuration management tools then use them repeatedly and/or test them, try to avoid one shot configuration management i.e. the operation is only run once once to configure a resource even one you do not expect to change, because it will change and it will break and you will be rushing around trying to figure out what happened.
- Use the Guard Rail Pattern by putting safe conditionals in your configuration management to do be able to test runs without the worry of screwing up e.g. Ansible tasks:
```
- name: “Do something really Dangerous"
  command: /sbin/something —could —be —dangerous --if --run --it --in --prod
  when: testmode == “Off"
```

Processes

How should we be doing the stuff

“If you can't describe what you are doing as a process, you don't know what you're doing.”

W. Edwards Deming

Everyone seems to have their own particular spin on Agile Scrum or Kanban, so explain up front what the process is and refine when and if necessary.
Have Shovel Ready work for new starters, create a backlog of work that can be easily done by a new starter:
- Ideally work that:
  - is well defined,
  - is easily explained,
  - requires some research,
  - adds value and;
  - is not grunt work e.g. documentation.
Assign your new starter an on boarding buddy/mentor
- Ensure that this "Buddy" has enough free cycles to be there for the new start if needed
Pair with new start as soon and as often as possible depending on the complexity of the environment this could go on for weeks (if not months), don't be afraid to pick up this pairing at a later date if the engineer has never touched that code block before.
When [and if] you do a Retro, then base it against a known good baseline i.e.
- If you are doing production deploys in the early hours of the night and it goes successfully, remember this is not necessarily reflect a good deployment.
Put as much detail into tasks / stories as possible including:
- Assumptions,
- Reference information and existing implementations,
- Ensuring to narrow down the acceptance criteria in order to prevent unnecessary research or rework,
- Diagrams.
Ideally make your Tasks/Stories as small an atomic as possible this is for a number of reasons some of those being:
- It makes them easier to handle and get your head around
- You are less likely to have to context switch within a story if it has a narrow problem domain
- You are more likely to actually finish that particular story and not have to pick up a new one and have to go back to the original story, since the smaller it is the less likely it is to run into some sort of unpredicted blockage.
Avoid [if possible] onboarding during crunch times (important or critical planned releases)
Ideally have your accounts linked with some central or shared directory e.g. Github/Google/LDAP so your new starters don’t have to create and remember 101 user/password combinations or have to request access to multiple applications separately.
Use configuration management that has a dry run feature e.g. --testing_mode on
- Blocking infrastructure tests or linters to catch mistakes early, e.g.
Add or invite individual to any relevant Slack, IRC or Microsoft Teams channels or Mailing lists.
Provide information regarding relevant processes e.g.
- Incident, problem and change management
- Deploying changes / releases to the different environments
- Ordering infrastructure / tools
- Authorization for tools & applications
- Use of test environments and creating and using testdata
Have Clean code It really helps if your code is good, sensibly organized and well structured. If the code base is large, it should be broken down into smaller understandable segments
Create a Papercuts.md in your Repos, These are a log of things that have hurt us in the current environment, they may not be actual technical debt,however they could be things for us to discuss and possibly fix in the future.
If you have adopted a particular coding style guideline on your project then document or reference it for new joiners to easily reference and adopt
Story kickoffs can be extremely useful to new starters by helping them getting to the mindset of the team, identify areas that aren't immediately visible in the code base and generally reduce constant rework due to poor or missing acceptance criteria.
Embed you processes in your code. If your process requires you to hand off to another team to get the thing you want done e.g. After issuing a Pull Request you need to notify another team to run a Jenkins pipeline, then put the team and the contact information in the documentation (e.g. Slack Channel).
Use code formatters to standardize the structure your code e.g. terraform fmt this can make reading diffs a lot easier since you don't have to deal with things like differing indentation.

Version control management

“A generation which ignores history has no past and no future.”

Robert Heinlein

How do we safely change the things

Document your coding standards and strategies in the open e.g.
- Version control & branch strategy
- Code review process
- Release handling management
Have an Up to date README documentation in all repos for example
- 👍 have sequence diagrams in all repos e.g. plantuml or mermaid
- 👍 have Architecture Decision Records in your repos
- 👍 Have a Clear and concise git history that clearly and easily documents the changes done and the reasons why in your repositories
Make Pull Requests a first class citizen, nothing is more demoralising than having a Pull Request sitting around without feedback and a chance of being merged especially if it needs to be continually rebased.
Good Pull Requests can also be an excellent teaching tool for new starts or old hands alike, a good PR tell's you what was implemented, why and how, so if you (or anyone else) need to do something similar in the future it will make things a lot easier than relying on your memory or tribal knowledge. You can also prompt for good Pull Requests by using Pull Request Templates that suggest your best practice format.
If you use slack or something similar consider adding a notification bot for pull request and push activities, e.g. for bitbucket or github to notify your colleagues that a Pull Request is ready for review.
Keep your pull request list short and tidy, merge good requests quickly and close poor ones or those that are never going to be merged.
Integrate your git history with your external issue tracker so that it can automatically reference the changes related to a story and put in place some automated branch naming pattern protection to ensure that any branches match the issue trackers issue reference format, this way you enforce the best practice of a branch matching a historical record in (for example) Jira as to why something was created, changed or deleted.

Development environments

How do we safely change things

“Measure twice, cut once”

Proverb

Make it easy to set up your local development environment, you should not have to do the following just so you can start work:
- Log multiple service requests
- Read through multiple wiki pages
- hunt down multiple individuals
- get multiple emails with multiple links
- Ask multiple people how their personal environment is configured
Have at least a minimally functioning Continuous Integration setup
Make your tooling easy to set up an easy to use cross platform or run a local environment that does not mess up what’s currently there e.g. in a virtual machine
- Version managers for example: asdf pyenv, jenv, Rbenv, venv, virtualenv, pyenv-virtualenv, pipenv
- 👍 Vagrant boxen in order to test locally
- 👍 Docker containers e.g. using the Three Musketeers pattern
- 👍 The ability to create individualized development environments in the cloud e.g. AWS, Azure, Google, Digital Ocean, etc in order to safely deploy, iterate and test in a separate (and safe) environment

Useful links

Would you like to know more?

See a problem here

See a problem? Need something clarified? Raise and Issue and I'll try and fix it.

Contributing

I'm open to well structured Pull Requests

Fork it!
Create your feature branch: git checkout -b my-new-feature
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin my-new-feature
Submit a pull request :D

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 236

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (27) 🔗