+201223538180

Web site Developer I Advertising and marketing I Social Media Advertising and marketing I Content material Creators I Branding Creators I Administration I System SolutionFrom a Single Repo, to Multi-Repos, to Monorepo, to Multi-Monorepo

Web site Developer I Advertising and marketing I Social Media Advertising and marketing I Content material Creators I Branding Creators I Administration I System SolutionFrom a Single Repo, to Multi-Repos, to Monorepo, to Multi-Monorepo

Web site Developer I Advertising and marketing I Social Media Advertising and marketing I Content material Creators I Branding Creators I Administration I System Answer

I’ve been engaged on the identical challenge for a number of years. Its preliminary model was an enormous monolithic app containing 1000’s of recordsdata. It was poorly architected and non-reusable, however was hosted in a single repo making it straightforward to work with. Later, I “mounted” the mess within the challenge by splitting the codebase into autonomous packages, internet hosting every of them by itself repo, and managing them with Composer. The codebase grew to become correctly architected and reusable, however being cut up throughout a number of repos made it much more troublesome to work with.

Because the code was reformatted repeatedly, its internet hosting within the repo additionally needed to adapt, going from the preliminary single repo, to a number of repos, to a monorepo, to what could also be known as a “multi-monorepo.”

Let me take you on the journey of how this happened, explaining why and after I felt I needed to swap to a brand new method. The journey consists of 4 levels (thus far!) so let’s break it down like that.

Stage 1: Single repo

The challenge is leoloso/PoP and it’s been by a number of internet hosting schemes, following how its code was re-architected at completely different occasions.

It was born as this WordPress web site, comprising a theme and a number of other plugins. All the code was hosted collectively in the identical repo.

A while later, I wanted one other web site with comparable options so I went the short and simple manner: I duplicated the theme and added its personal customized plugins, all in the identical repo. I obtained the brand new web site operating very quickly.

I did the identical for one more web site, after which one other one, and one other one. Finally the repo was internet hosting some 10 websites, comprising 1000’s of recordsdata.

A single repository internet hosting all our code.

Points with the only repo

Whereas this setup made it straightforward to spin up new websites, it didn’t scale properly in any respect. The large factor is {that a} single change concerned looking out for a similar string throughout all 10 websites. That was utterly unmanageable. Let’s simply say that replicate/paste/search/exchange grew to become a routine factor for me.

So it was time to start out coding PHP the proper manner.

Stage 2: Multirepo

Quick ahead a few years. I utterly cut up the appliance into PHP packages, managed through Composer and dependency injection.

Composer makes use of Packagist as its primary PHP package deal repository. With a purpose to publish a package deal, Packagist requires a composer.json file positioned on the root of the package deal’s repo. Meaning we’re unable to have a number of PHP packages, every of them with its personal composer.json hosted on the identical repo.

As a consequence, I needed to swap from internet hosting the entire code within the single leoloso/PoP repo, to utilizing a number of repos, with one repo per PHP package deal. To assist handle them, I created the group “PoP” in GitHub and hosted all repos there, together with getpop/root, getpop/component-model, getpop/engine, and plenty of others.

Within the multirepo, every package deal is hosted by itself repo.

Points with the multirepo

Dealing with a multirepo might be straightforward when you will have a handful of PHP packages. However in my case, the codebase comprised over 200 PHP packages. Managing them was no enjoyable.

The rationale that the challenge was cut up into so many packages is as a result of I additionally decoupled the code from WordPress (in order that these may be used with different CMSs), for which each package deal have to be very granular, coping with a single objective.

Now, 200 packages shouldn’t be odd. However even when a challenge includes solely 10 packages, it may be troublesome to handle throughout 10 repositories. That’s as a result of each package deal have to be versioned, and each model of a package deal depends upon some model of one other package deal. When creating pull requests, we have to configure the composer.json file on each package deal to make use of the corresponding improvement department of its dependencies. It’s cumbersome and bureaucratic.

I ended up not utilizing function branches in any respect, at the very least in my case, and easily pointed each package deal to the dev-master model of its dependencies (i.e. I used to be not versioning packages). I wouldn’t be stunned to be taught that it is a widespread observe most of the time.

There are instruments to assist handle a number of repos, like meta. It creates a challenge composed of a number of repos and doing git commit -m "some message" on the challenge executes a git commit -m "some message" command on each repo, permitting them to be in sync with one another.

Nevertheless, meta won’t assist handle the versioning of every dependency on their composer.json file. Regardless that it helps alleviate the ache, it’s not a definitive answer.

So, it was time to deliver all packages to the identical repo.

Stage 3: Monorepo

The monorepo is a single repo that hosts the code for a number of tasks. Because it hosts completely different packages collectively, we will model management them collectively too. This manner, all packages might be printed with the identical model, and linked throughout dependencies. This makes pull requests quite simple.

The monorepo hosts a number of packages.

As I discussed earlier, we aren’t capable of publish PHP packages to Packagist if they’re hosted on the identical repo. However we will overcome this constraint by decoupling improvement and distribution of the code: we use the monorepo to host and edit the supply code, and a number of repos (at one repo per package deal) to publish them to Packagist for distribution and consumption.

The monorepo hosts the supply code, a number of repos distribute it.

Switching to the Monorepo

Switching to the monorepo method concerned the next steps:

First, I created the folder construction in leoloso/PoP to host the a number of tasks. I made a decision to make use of a two-level hierarchy, first underneath layers/ to point the broader challenge, after which underneath packages/, plugins/, purchasers/ and whatnot to point the class.

Showing the HitHub repo for a project called PoP. The screen in is dark mode, so the background is near black and the text is off-white, except for blue links.
The monorepo layers point out the broader challenge.

Then, I copied all supply code from all repos (getpop/engine, getpop/component-model, and so forth.) to the corresponding location for that package deal within the monorepo (i.e. layers/Engine/packages/engine, layers/Engine/packages/component-model, and so forth).

I didn’t must preserve the Git historical past of the packages, so I simply copied the recordsdata with Finder. In any other case, we will use hraban/tomono or shopsys/monorepo-tools to port repos into the monorepo, whereas preserving their Git historical past and commit hashes.

Subsequent, I up to date the outline of all downstream repos, to start out with [READ ONLY], resembling this one.

Showing the GitHub repo for the component-model project. The screen is in dark mode, so the background is near black and the text is off-white, except for blue links. There is a sidebar to the right of the screen that is next to the list of files in the repo. The sidebar has an About heading with a description that reads: Read only, component model for Pop, over which the component-based architecture is based." This is highlighted in red.
The downstream repo’s “READ ONLY” is positioned within the repo description.

I executed this process in bulk through GitHub’s GraphQL API. I first obtained the entire descriptions from the entire repos, with this question:

{
  repositoryOwner(login: "getpop") {
    repositories(first: 100) {
      nodes {
        id
        title
        description
      }
    }
  }
}

…which returned an inventory like this:

{
  "information": {
    "repositoryOwner": {
      "repositories": {
        "nodes": [
          {
            "id": "MDEwOlJlcG9zaXRvcnkxODQ2OTYyODc=",
            "name": "hooks",
            "description": "Contracts to implement hooks (filters and actions) for PoP"
          },
          {
            "id": "MDEwOlJlcG9zaXRvcnkxODU1NTQ4MDE=",
            "name": "root",
            "description": "Declaration of dependencies shared by all PoP components"
          },
          {
            "id": "MDEwOlJlcG9zaXRvcnkxODYyMjczNTk=",
            "name": "engine",
            "description": "Engine for PoP"
          }
        ]
      }
    }
  }
}

From there, I copied all descriptions, added [READ ONLY] to them, and for each repo generated a brand new question executing the updateRepository GraphQL mutation:

mutation {
  updateRepository(
    enter: {
      repositoryId: "MDEwOlJlcG9zaXRvcnkxODYyMjczNTk="
      description: "[READ ONLY] Engine for PoP"
    }
  ) {
    repository {
      description
    }
  }
}

Lastly, I launched tooling to assist “cut up the monorepo.” Utilizing a monorepo depends on synchronizing the code between the upstream monorepo and the downstream repos, triggered at any time when a pull request is merged. This motion is known as “splitting the monorepo.” Splitting the monorepo might be achieved with a git subtree cut up command however, as a result of I’m lazy, I’d relatively use a software.

I selected Monorepo builder, which is written in PHP. I like this software as a result of I can customise it with my very own performance. Different standard instruments are the Git Subtree Splitter (written in Go) and Git Subsplit (bash script).

What I like concerning the Monorepo

I really feel at dwelling with the monorepo. The velocity of improvement has improved as a result of coping with 200 packages feels just about like coping with only one. The increase is most evident when refactoring the codebase, i.e. when executing updates throughout many packages.

The monorepo additionally permits me to launch a number of WordPress plugins without delay. All I must do is present a configuration to GitHub Actions through PHP code (when utilizing the Monorepo builder) as an alternative of hard-coding it in YAML.

To generate a WordPress plugin for distribution, I had created a generate_plugins.yml workflow that triggers when making a launch. With the monorepo, I’ve tailored it to generate not only one, however a number of plugins, configured through PHP by a customized command in plugin-config-entries-json, and invoked like this in GitHub Actions:

- id: output_data
  run: |
    echo "quot;::set-output title=plugin_config_entries::$(vendor/bin/monorepo-builder plugin-config-entries-json)"

This manner, I can generate my GraphQL API plugin and different plugins hosted within the monorepo . The configuration outlined through PHP is this one.

class PluginDataSource
{
  public perform getPluginConfigEntries(): array
  {
    return [
      // GraphQL API for WordPress
      [
        'path' => 'layers/GraphQLAPIForWP/plugins/graphql-api-for-wp',
        'zip_file' => 'graphql-api.zip',
        'main_file' => 'graphql-api.php',
        'dist_repo_organization' => 'GraphQLAPI',
        'dist_repo_name' => 'graphql-api-for-wp-dist',
      ],
      // GraphQL API - Extension Demo
      [
        'path' => 'layers/GraphQLAPIForWP/plugins/extension-demo',
        'zip_file' => 'graphql-api-extension-demo.zip',
        'main_file' =>; 'graphql-api-extension-demo.php',
        'dist_repo_organization' => 'GraphQLAPI',
        'dist_repo_name' => 'extension-demo-dist',
      ],
    ];
  }
}

When making a launch, the plugins are generated through GitHub Actions.

Dark mode screen in GitHub showing the actions for the project.
This determine exhibits plugins generated when a launch is created.

If, sooner or later, I add the code for yet one more plugin to the repo, it should even be generated with none bother. Investing a while and power producing this setup now will certainly save loads of time and power sooner or later.

Points with the Monorepo

I consider the monorepo is especially helpful when all packages are coded in the identical programming language, tightly coupled, and counting on the identical tooling. If as an alternative we’ve got a number of tasks based mostly on completely different programming languages (resembling JavaScript and PHP), composed of unrelated components (resembling the principle web site code and a subdomain that handles publication subscriptions), or tooling (resembling PHPUnit and Jest), then I don’t consider the monorepo supplies a lot of a bonus.

That mentioned, there are downsides to the monorepo:

  • We should use the identical license for the entire code hosted within the monorepo; in any other case, we’re unable so as to add a LICENSE.md file on the root of the monorepo and have GitHub decide it up robotically. Certainly, leoloso/PoP initially offered a number of libraries utilizing MIT and the plugin utilizing GPLv2. So, I made a decision to simplify it utilizing the bottom widespread denominator between them, which is GPLv2.
  • There may be a variety of code, a variety of documentation, and loads of points, all from completely different tasks. As such, potential contributors that had been drawn to a particular challenge can simply get confused.
  • When tagging the code, all packages are versioned independently with that tag whether or not their specific code was up to date or not. This is a matter with the Monorepo builder and never essentially with the monorepo method (Symfony has solved this downside for its monorepo).
  • The problems board wants correct administration. Specifically, it requires labels to assign points to the corresponding challenge, or threat it changing into chaotic.
Showing the list of reported issues for the project in GitHub in dark mode. The image shows just how crowded and messy the screen looks when there are a bunch of issues from different projects in the same list without a way to differentiate them.
The problems board can turn out to be chaotic with out labels which might be related to tasks.

All these points aren’t roadblocks although. I can address them. Nevertheless, there is a matter that the monorepo can’t assist me with: internet hosting each private and non-private code collectively.

I’m planning to create a “PRO” model of my plugin which I plan to host in a personal repo. Nevertheless, the code within the repo is both public or personal, so I’m unable to host my personal code within the public leoloso/PoP repo. On the identical time, I need to preserve utilizing my setup for the personal repo too, significantly the generate_plugins.yml workflow (which already scopes the plugin and downgrades its code from PHP 8.0 to 7.1) and its chance to configure it through PHP. And I need to preserve it DRY, avoiding copy/pastes.

It was time to change to the multi-monorepo.

Stage 4: Multi-monorepo

The multi-monorepo method consists of various monorepos sharing their recordsdata with one another, linked through Git submodules. At its most elementary, a multi-monorepo includes two monorepos: an autonomous upstream monorepo, and a downstream monorepo that embeds the upstream repo as a Git submodule that’s capable of entry its recordsdata:

A giant red folder illustration is labeled as the downstream monorepo and it contains a smaller green folder showing the upstream monorepo.
The upstream monorepo is contained inside the downstream monorepo.

This method satisfies my necessities by:

  • having the general public repo leoloso/PoP be the upstream monorepo, and
  • creating a personal repo leoloso/GraphQLAPI-PRO that serves because the downstream monorepo.
The same illustration as before, but now the large folder is a bright pink and is labeled as with the project name, and the smaller folder is a purplish-blue and labeled with the name of the public downstream module,.
A non-public monorepo can entry the recordsdata from a public monorepo.

leoloso/GraphQLAPI-PRO embeds leoloso/PoP underneath subfolder submodules/PoP (discover how GitHub hyperlinks to the particular commit of the embedded repo):

This determine present how the general public monorepo is embedded inside the personal monorepo within the GitHub challenge.

Now, leoloso/GraphQLAPI-PRO can entry all of the recordsdata from leoloso/PoP. As an illustration, script ci/downgrade/downgrade_code.sh from leoloso/PoP (which downgrades the code from PHP 8.0 to 7.1) might be accessed underneath submodules/PoP/ci/downgrade/downgrade_code.sh.

As well as, the downstream repo can load the PHP code from the upstream repo and even lengthen it. This manner, the configuration to generate the general public WordPress plugins might be overridden to provide the PRO plugin variations as an alternative:

class PluginDataSource extends UpstreamPluginDataSource
{
  public perform getPluginConfigEntries(): array
  {
    return [
      // GraphQL API PRO
      [
        'path' => 'layers/GraphQLAPIForWP/plugins/graphql-api-pro',
        'zip_file' => 'graphql-api-pro.zip',
        'main_file' => 'graphql-api-pro.php',
        'dist_repo_organization' => 'GraphQLAPI-PRO',
        'dist_repo_name' => 'graphql-api-pro-dist',
      ],
      // GraphQL API Extensions
      // Google Translate
      [
        'path' => 'layers/GraphQLAPIForWP/plugins/google-translate',
        'zip_file' => 'graphql-api-google-translate.zip',
        'main_file' => 'graphql-api-google-translate.php',
        'dist_repo_organization' => 'GraphQLAPI-PRO',
        'dist_repo_name' => 'graphql-api-google-translate-dist',
      ],
      // Occasions Supervisor
      [
        'path' => 'layers/GraphQLAPIForWP/plugins/events-manager',
        'zip_file' => 'graphql-api-events-manager.zip',
        'main_file' => 'graphql-api-events-manager.php',
        'dist_repo_organization' => 'GraphQLAPI-PRO',
        'dist_repo_name' => 'graphql-api-events-manager-dist',
      ],
    ];
  }
}

GitHub Actions will solely load workflows from underneath .github/workflows, and the upstream workflows are underneath submodules/PoP/.github/workflows; therefore we have to copy them. This isn’t excellent, although we will keep away from modifying the copied workflows and deal with the upstream recordsdata as the only supply of fact.

To repeat the workflows over, a easy Composer script can do:

{
  "scripts": {
    "copy-workflows": [
      "php -r "copy('submodules/PoP/.github/workflows/generate_plugins.yml', '.github/workflows/generate_plugins.yml');"",
      "php -r "copy('submodules/PoP/.github/workflows/split_monorepo.yaml', '.github/workflows/split_monorepo.yaml');""
    ]
  }
}

Then, every time I edit the workflows within the upstream monorepo, I additionally copy them to the downstream monorepo by executing the next command:

composer copy-workflows

As soon as this setup is in place, the personal repo generates its personal plugins by reusing the workflow from the general public repo:

This determine exhibits the PRO plugins generated in GitHub Actions.

I’m extraordinarily glad with this method. I really feel it has eliminated the entire burden from my shoulders regarding the way in which tasks are managed. I examine a WordPress plugin creator complaining that managing the releases of his 10+ plugins was taking a substantial period of time. That doesn’t occur right here—after I merge my pull request, each private and non-private plugins are generated robotically, like magic.

Points with the multi-monorepo

First off, it leaks. Ideally, leoloso/PoP must be utterly autonomous and unaware that it’s used as an upstream monorepo in a grander scheme—however that’s not the case.

When doing git checkout, the downstream monorepo should move the --recurse-submodules choice as to additionally checkout the submodules. Within the GitHub Actions workflows for the personal repo, the checkout have to be completed like this:

- makes use of: actions/[email protected]
  with:
    submodules: recursive

Because of this, we’ve got to enter submodules: recursive to the downstream workflow, however to not the upstream one although they each use the identical supply file.

To unravel this whereas sustaining the general public monorepo as the only supply of fact, the workflows in leoloso/PoP are injected the worth for submodules through an setting variable CHECKOUT_SUBMODULES, like this:

env:
  CHECKOUT_SUBMODULES: "";

jobs:
  provide_data:
    steps:
      - makes use of: actions/[email protected]
        with:
          submodules: ${{ env.CHECKOUT_SUBMODULES }}

The setting worth is empty for the upstream monorepo, so doing submodules: "" works properly. After which, when copying over the workflows from upstream to downstream, I exchange the worth of the setting variable to "recursive" in order that it turns into:

env:
  CHECKOUT_SUBMODULES: "recursive"

(I’ve a PHP command to do the substitute, however we might additionally pipe sed within the copy-workflows composer script.)

This leakage reveals one other difficulty with this setup: I have to overview all contributions to the general public repo earlier than they’re merged, or they might break one thing downstream. The contributors would additionally utterly unaware of these leakages (they usually couldn’t be blamed for it). This case is restricted to the general public/private-monorepo setup, the place I’m the one one that is conscious of the complete setup. Whereas I share entry to the general public repo, I’m the one one accessing the personal one.

For instance of how issues might go fallacious, a contributor to leoloso/PoP may take away CHECKOUT_SUBMODULES: "" since it’s superfluous. What the contributor doesn’t know is that, whereas that line shouldn’t be wanted, eradicating it should break the personal repo.

I assume I would like so as to add a warning!

env:
  ### ☠️ Don't delete this line! Or unhealthy issues will occur! ☠️
  CHECKOUT_SUBMODULES: ""

Wrapping up

My repo has gone by fairly a journey, being tailored to the brand new necessities of my code and utility at completely different levels:

  • It began as a single repo, internet hosting a monolithic app.
  • It grew to become a multirepo when splitting the app into packages.
  • It was switched to a monorepo to higher handle all of the packages.
  • It was upgraded to a multi-monorepo to share recordsdata with a personal monorepo.

Context means all the pieces, so there isn’t a “finest” method right here—solely options which might be kind of appropriate to completely different situations.

Has my repo reached the top of its journey? Who is aware of? The multi-monorepo satisfies my present necessities, nevertheless it hosts all personal plugins collectively. If I ever must grant contractors entry to a particular personal plugin, whereas stopping them to entry different code, then the monorepo could not be the perfect answer for me, and I’ll must iterate once more.

I hope you will have loved the journey. And, in case you have any concepts or examples from your personal experiences, I’d love to listen to about them within the feedback.

Supply hyperlink

Leave a Reply