Crazy Python Action

Your code is too DRY!

Published by Steven James

Last Updated on 2024-10-22

If you have ever let anyone else review your code, you have probably heard some variation of "This is repetitive" or "This code is not very DRY." We quickly learn that when a function or variable can be extracted we should do it! If we have copy-pasted the same code 20 times, chances are someone will modify one copy and forget to modify the other 19, making it harder to find the duplicated code in the future.

Even our IDEs highlight repetition now!

This purpose of decreasing repetition is usually clear and valid -- it makes future refactoring easier, and bugs easier to spot. But can with err in the opposite direction? Can code by too DRY?

Boilerplate

We spend a lot of time as developers trying to reduce boiler-plate code. No one likes typing the same thing over and over. But one thing we do like is structure and patterns. Being able to look at code you've never seen and immediately see what it does is important and valuable.

Consider the following two functions, both of which represent API functions that returns bits of information given a zip code.

def get_weather(zip_code: str):
    validate_zip_code(zip_code)

    with db.connect() as connection:
        weather = connection.execute(get_weather_sql).fetchall()

    return weather


def get_population(zip_code: str):
    validate_zip_code(zip_code)

    with census.connect() as connection:
        population = connection.execute(get_population_sql).fetchall()

    return population

The two functions look nearly identical and you could imagine twenty more just like them, getting various bits of information. That could lead us to the conclusion that we should have a single function called get_property(zip_code, property_name). Perhaps in many cases that is a good idea, but a closer look shows that the second function uses a different data source than the first (census instead of db). What we are seeing here is two functions using boilerplate code that leaves plenty of room for flexibility in each api function.

By boilerplate I simply mean that the functions have a similar layout and structure. They follow clear steps: first they validate the input (perhaps raising an exception if the input is invalid), then they fetch data. Following this type of structure throughout a project makes it easier for reviewers and future developers (including yourself in the future!) to quickly understand and enhance the project. It leaves room for adding additional steps in a process, such as adding a modification step after fetching the data. It still allows for code reuse (both functions validate a zip code using the same function). A developer adding new functions to the same project should strive to use the same format for their code.

Unit Tests (sometimes)

def test_create_document_explicit(client):
    response = client.post(
        '/document', json={
            'document_type': 'purchase_order',
            'status': 'incoming.received',
            'customer': 524,
        }
    )
    assert response.status_code == HTTP_200_OK, response.json()


def test_create_document(client):
    response = client.post(
        '/document', json=sample_document_factory()
    )
    assert response.status_code == HTTP_200_OK, response.json()

The two test functions above are functionally equivalent. The second function uses a factory function to build its sample data, which is often is a good idea if you need to re-use a factory function in many tests. However, there are also many advantages to having explicit cases like the first test function above. It makes the test extremely clear about what data is being sent to the client.post() function. Modifying the explicit test to send alternate data is also easy, with no possibility of affecting other tests.

You can accomplish the same advantages while keeping your test functions empty of data, though. Another way to accomplish similar advantages while maintaining small functions and code reusability is to make various named factory functions for your sample data: sample_document_with_invalid_type(), sample_document_valid(), etc. This does make each test case slightly more difficult to read at a glance, but by using parametrize or similar, this can make it easier to reuse your sample inputs. :

def sample_document_valid(client):
    return {
        'document_type': 'purchase_order',
        'status': 'incoming.received',
        'customer': 524,
    }


def sample_document_invalid_type(client):
    return {
        'document_type': 'INVALID_TYPE',
        'status': 'incoming.received',
        'customer': 524,
    }


@pytest.parametrize('factory,expected_status', [
    (sample_document_valid, HTTP_200_OK),
    (sample_document_invalid_type, HTTP_422_UNPROCESSABLE_ENTITY),
])
def test_create_document(client, factory, expected_status):
    response = client.post('/document', json=factory())
    assert response.status_code == expected_status, response.json()