HEITS.digital - How to create a welcoming codebase

Optimizing large mobile projects for smooth onboarding

Cliché opening statement: structuring a large project efficiently is challenging. Even with the best intentions and years of experience it's possible to end up with an over-engineered tangle of spaghetti code that becomes difficult to maintain and impossible to scale. Within a short span of time, a successful startup can find itself struggling to work with the sprawling remnants of an old MVP that was never intended to reach the levels of complexity it did.

Being involved from the beginning can be a significant blind spot: having a monopoly of knowledge on the system's workarounds and idiosyncrasies (“job security”) can create the illusion of simplicity, and failing to question our choices often leads to an echo chamber where unpredictable behaviors escalate. What better way to keep things simple than to examine our work from the perspective of a newcomer?

Let me present a few best practices that can go a long way in terms of stability, scalability and developer satisfaction. Even if you don’t plan on expanding your team, these ideas can significantly improve the quality of your codebase, and streamline the flow of information by highlighting the important parts and hiding the distractions.

So, how can we simplify onboarding new team members and make sure they can contribute without breaking things? What would you expect to see after being granted access to the codebase of a huge project?

A proper readme file

This might feel a bit anticlimactic but let’s get it out of the way: the readme file of the repository is the entry point of the project. It’s also a document that is alive: it shouldn’t be set in stone, feel free to revisit and rewrite it as the product evolves, even delete the outdated parts.

A properly modularized project could contain separate readme files for each module, describing the relevant pieces of functionality individually, in more detail.

Here are a couple of things one might expect to see in a readme (many of these can be links of course):

Useful external resources:
- Product documentation - How do the different parts of the app work? Navigation graphs, user journeys, UI designs mocks, backend API documentation, maybe a wiki;
- Team structure - Who are the owners of the various aspects of the development process? Who should I talk to if I’m lost at a specific part?;
- Project management tools - Jira boards (with sensible workflows), calendars, etc.;
List of accesses needed to work on the project (VPN, third party tools);
Work environment setup - what should I download, how should I configure my IDE? Any environment variables I should set for creating local builds? Also, information about potential Git submodules is useful;
Project structure description - what is the high level logic for the architecture? How is the codebase organized? Visual aids are highly appreciated;
List of important third party technologies - libraries that are crucial to core functionality of the app should be mentioned here, such as DI, navigation, database-management or networking solutions;
Contribution guidelines:
- Coding standards and naming conventions - should also mention any static code analyzers new members should be aware of;
- PR template (not necessarily part of the readme but there should be one);
- Git usage conventions (commit naming format, branching);
- Any important processes contributors should be aware of (localization for example);
Information about build distribution, CI/CD;
Badges - why not? Some nice little widgets that get refreshed automatically and display information at a glance about build status, code coverage, lines of code, etc.

Please note that not all of these ideas will apply to open source projects, as in that case the readme might serve slightly different purposes.

Simple creation of local builds

After cloning the main branch of the repository, compiling a debug build locally with the latest stable version of the IDE should be as simple as possible (any complications should have been mentioned in the readme). If the main branch does not compile (either because of temporary issues or unintuitive setup requirements), the project immediately feels more intimidating than it should. Too many warnings during the build process are also a red flag: they give a first impression of a project abandoned by its contributors.

Subjective personal preference: I would also like to be able to build the production variant (of course with “fake” signing credentials) just to test potential minification and obfuscation issues.

Unreasonably long build times are often detrimental to productivity, and problems affecting build performance should be tackled consistently during development (good modularization can help with parallel execution, unused dependencies should be removed, third party plugins and libraries should be up to date, etc).

A decent .gitignore setup

If a new developer's first commit includes automatically generated files, the gitignore file is probably incomplete. Or maybe they didn’t read the readme that we so nicely prepared in the first section.

In general, the version control system should not track the following:

Generated build files that take up unnecessary space (build output directories, build cache, compiled code);
Files containing sensitive information (signing keys, tokens to different services);
Personal IDE configuration files, logs, reports, etc.;
Hidden system files.

It might be a good idea to include the code style guide configuration in the repository, but keep in mind that doing so ties the project to a specific IDE. In these cases the gitignore file should be as specific as possible (enforcing the standards agreed upon by the team, but not modifying the developer’s personal preferences that don’t affect the code directly).

Sensible modularization

Let’s get into the more interesting topics. Modularization is a big one, and advocating for one of the many approaches to clean architecture can be a controversial stance. Nevertheless,I’m sure that we can agree on a few high-level guidelines:

Complex projects should not have a single monolith module, instead they should be broken down into smaller, self-contained compilation units, each having different and well-defined responsibilities;
The data layer should clearly be distinguished from domain and presentation. The way that unidirectional data flow is implemented should be visible just by looking at the project structure;
Modules can be organized too! Moving them into a hierarchical folder structure significantly improves readability and navigation.
A module should have a clearly defined public API, and its implementation details should be encapsulated. Changing the implementation should not affect the API (and as a result should not invalidate other modules, thus improving build performance). Going a step further, we can also have modules that only contain contracts, and modules that only contain implementation;
Modules should have as few dependencies as possible. We should aim to decouple them from the platform as well, if possible (for example a repository module should have no idea if it’s running on Android or not).

When it comes to feature modules, concise but meaningful naming is crucial. Ideally, if a new developer is getting familiar with the application by testing its UI, they should be able to connect the different flows to the various modules just by looking at their names.

Lastly, the entry point of the app should be obvious: the main module handling the navigation and the dependency injection graph is a special one and should be highlighted as such.

Clear and comprehensive abstractions

One of the more overwhelming experiences while trying to wrap your head around new concepts is separating the noise from the important pieces of information. We should always try to prevent these situations by creating small API surfaces. A good public contract offers an overview of the class's capabilities, and also provides a solid basis for automated testing.

The public functions and their signatures should use clear and understandable naming. I do believe in self-documenting code, but comments are always appreciated.

Dependency injection should always expose abstractions, not implementations.

And to make sure that we created the right abstractions, we can easily verify them with…

Tests that actually help

A low-hanging fruit for sure, but the topic of meaningful testing is connected to the clearly defined API contracts described in the previous section. Without good abstractions it’s too easy to write tests for implementation details that cause more harm than good in the long run: bad tests hinder development by blocking refactorization, while good ones encourage it.

A test failing should be a good thing: the developer who made the breaking change should stop to think about what they just introduced, and after careful consideration modify their work or the tests. Either way, they shouldn’t feel frustrated about the test failing because of technicalities, or more often, unintentionally testing the wrong parts of the given component.

Tests can also serve as the technical documentation of a component’s expected range of functionality, thus simplifying onboarding new developers.

Useful CI/CD configuration

The opportunities here are practically endless: all sorts of different tools can be used to automate the day to day tasks. On the other hand, mindlessly integrating services or choosing a configuration that’s inappropriate for the project can become a serious hindrance.

Here are just a few general ideas that one could expect from a well-configured mobile CI/CD pipeline:

Performing static code analysis to make sure that the contributions follow the team’s standards. A good automation can leave comments on the PR if problems are encountered;
Compiling all build variants to rule out the most obvious issues;
Running all unit and integration tests. Running UI tests. Performing screenshot testing;
Keeping the third party dependencies up to date by creating automated PR-s;
Creating and distributing nightly, beta, and even production builds to different channels (including the stores);
Posting automated messages to various communication channels about potential issues or successfully completed milestones.

Nicely organized build scripts

Developers often delve into an unfamiliar codebase by examining its build scripts, which can unveil valuable high-level insights. Among these insights, the different third party dependencies of the various modules can provide useful information, so does the dependency graph of the project.

Keeping the build scripts organized is a relatively easy task, due to their small number and isolation from the rest of the codebase. Some aspects of organization to keep in mind:

Referencing version numbers across different modules instead of hardcoding them. Dependency versions should be centralized (version catalogs are a great way to achieve this in Gradle-based projects);
Reducing duplicated code. Library modules often have similar build scripts, and repeating the same setup boilerplate decreases maintainability. In the case of Groovy, scripts can be merged together to avoid repeating the same ceremonies. With the Kotlin DSL, writing custom Gradle plugins feels like the most modern solution.

Feature flags

While feature flags have a number of benefits for the product in general, let’s focus on how they can simplify the lives of new team members. Having a nicely kept list of these toggles (maybe in a debug menu) is useful for newcomers, as they can see what the team is currently working on right on the UI - a bit simpler than going through the Git history.

Furthermore, there’s also a psychological element to them. With all the impostor syndrome of the first few weeks and the overwhelming aspects of ramping up, actually starting to contribute can be a stressful experience. Having the changes behind a feature flag gives an extra sense of security

Pet peeves

Finally, here are some more subjective points that, in my opinion, show that developers care about their codebase. While it’s difficult to quantify how some of these aspects contribute to quality, they might improve the overall developer satisfaction and definitely give the impression of a product that’s being crafted in a mindful way:

A clean Git history. Can we please delete the branches that have already been merged? Also, a sensible use of rebase instead of merge commits (for keeping feature branches up to date) is a huge plus, so is the practice of tagging releases. Squashing commits that belong together is another way to get rid of noise;
Features that simplify testing, such as different app icons for the various build types and a debug menu containing useful tools that are specific to the app;
Following platform guidelines and embracing best practices. While this is a product decision in many cases, I’d much rather work on a project that seamlessly integrates into the platform than one that attempts to reinvent the wheel. By doing so, we gain access to a wealth of resources to address potential issues and benefit from a more familiar set of expectations during testing;
Consistently useful comments. I don’t believe that every function needs a javadoc, but if something counterintuitive is going on, a few pointers are always helpful;
The way duplicated code is handled (preferring composition over inheritance);
The way adopting new technologies is handled (deprecating old classes, making sensible decisions to gradually improve while keeping interoperability between the different systems).

Conclusions

Creating a sleek and functional app should not be a challenge with the modern array of tools available to developers. However, coming up with a stable and reliable system under the hood that can be extended later on without causing too much headache is a problem that most teams realize too late.

Tech debt is an inherent aspect of creating software, particularly in the swiftly evolving realm of mobile development. Every large project will have legacy code, as well as a few hacky solutions that somebody implemented sometime ago and no one remembers why - yet, no one dares to touch it. Nonetheless, we do have the ability to exert control over the spread of such suboptimal components within the core sections of the codebase.

A resilient product that stands the test of time breaks down complex problems into well-defined, manageable pieces. It should have adequate documentation, well thought-out safeguards (both in the form of standards and automations) and a sane project structure. Taking a step back and scrutinizing our processes is of paramount importance in avoiding the minor shortcuts that can increase complexity exponentially down the road.

Observing how a new team member navigates and adapts to such circumstances serves as an excellent litmus test for the system. A welcoming codebase effectively conceals its inherent complexity, fostering an environment that encourages developers to collaborate with it rather than struggle against it.

Written by

Péter Pandula

Android Developer

Share this article on

How to create a welcoming codebase

Péter Pandula

Android Developer

Péter Pandula

Latest vs. older tech - Based on a true story?

Must read AI articles from 2022