diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
index 0946737b..6713f4c4 100644
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -7,8 +7,6 @@ we pledge to follow the [University of Sheffield Research Software Engineering C
Instances of abusive, harassing, or otherwise unacceptable behavior
may be reported by following our [reporting guidelines][coc-reporting].
-Please contact the [course organiser](mailto:liam.pattinson@york.ac.uk)
-with any complaints.
[coc-reporting]: https://rse.shef.ac.uk/community/code_of_conduct#enforcement-guidelines
[coc]: https://rse.shef.ac.uk/community/code_of_conduct
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 728d7f0d..84c56141 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -46,9 +46,9 @@ use [GitHub flow][github-flow] to manage changes:
NB: The published copy of the lesson is usually in the `main` branch.
-[repo]: https://github.com/researchcodingclub/python-testing-for-research
-[repo-issues]: https://github.com/researchcodingclub/python-testing-for-research/issues
-[contact]: mailto:liam.pattinson@york.ac.uk
+[repo]: https://github.com/Romain-Thomas-Shef/FAIR_Management_plan
+[repo-issues]: https://github.com/Romain-Thomas-Shef/FAIR_Management_plan/issues
+[contact]: mailto:romain.thomas@sheffield.ac.uk
[github]: https://github.com
[github-flow]: https://guides.github.com/introduction/flow/
[github-join]: https://github.com/join
diff --git a/LICENSE.md b/LICENSE.md
index 14b7fb29..a60053e7 100644
--- a/LICENSE.md
+++ b/LICENSE.md
@@ -13,8 +13,7 @@ Attribution](https://creativecommons.org/licenses/by/4.0/) licence.
[Changes have been
made](https://github.com/RSE-Sheffield/fair4rs-lesson-setup) to adapt the
template to the specific context of the University of Sheffield's FAIR
-for Research Software training programme, and altered further by
-the University of York [Research Coding Club](https://researchcodingclub.github.io/).
+for Research Software training programme.
Unless otherwise noted, the instructional material in this lesson is
made available under the [Creative Commons Attribution
@@ -36,7 +35,7 @@ Under the following terms:
- **Attribution**---You must give appropriate credit (mentioning that
your work is derived from work that is Copyright (c) The University
- of York and, where practical, provide a [link to the
+ of Sheffield and, where practical, provide a [link to the
license][cc-by-human], and indicate if changes were made. You may do
so in any reasonable manner, but not in any way that suggests the
licensor endorses you or your use.
@@ -60,7 +59,7 @@ Except where otherwise noted, the example programs and other software
provided in this work are made available under the [OSI][osi]-approved
[MIT license][mit-license].
-Copyright (c) The University of York
+Copyright (c) The University of Sheffield
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
diff --git a/README.md b/README.md
index 6f9477d7..363fa64f 100644
--- a/README.md
+++ b/README.md
@@ -4,10 +4,6 @@ A short course on the basics of software testing in Python using the `pytest` li
This lesson uses [The Carpentries Workbench][workbench] template.
-It is derived from the [FAIR2 for Research Software](https://fair2-for-research-software.github.io/)
-training course [python-testing-for-research](https://github.com/FAIR2-for-research-software/python-testing-for-research)
-by the University of Sheffield.
-
## Course Description
Whether you are a seasoned developer or just write the occasional script, it's important to know that your code does what you intend, and will continue to do so as you make changes.
@@ -22,7 +18,7 @@ This course seeks to provide you with conceptual understanding and the tools you
- Running a test suite & understanding outputs
- Best practices
- Testing for errors
-- Testing floating point data
+- Testing data structures
- Fixtures
- Parametrisation
- Testing file outputs
@@ -34,27 +30,18 @@ Contributions are welcome, please refer to the [contribution guidelines](CONTRIB
### Build the lesson locally
-To render the lesson locally, you will need to have [R][r] installed.
-Instructions for using R with the Carpentries template is available on the
-[Carpentries website](https://carpentries.github.io/workbench/#installation).
-We recommend using the
-[`{renv}`](https://rstudio.github.io/renv/articles/renv.html) package.
-
-After cloning the repository, you can set up `renv` and install all packages with:
+To render the lesson locally, you will need to have [R][r] installed. Instructions for using R with the Carpentries template is [available](https://carpentries.github.io/workbench/#installation) but some additional setps have been taken to make sure the enivronment is reproducible using the [`{renv}`](https://rstudio.github.io/renv/articles/renv.html) package and an `renv.lockfile` is included which allows the environment to be re-created along with dependencies.
+After cloning the repository, you can set up the `renv` and install all packages with:
``` r
-renv::init()
+renv::restore()
# Optionally update packages
renv::update()
```
Once you have installed the dependencies, you can render the pages locally by starting R in the project root and running:
-
``` r
sandpaper::serve()
```
-
-When building the site subsequently, you may need to run `renv::activate()` first.
-
This will build the pages and start a local web-server in R and open it in your browser. These pages are "live" and will respond to local file changes if you save them.
[git]: https://git-scm.com
diff --git a/config.yaml b/config.yaml
index e2767493..7d3de2b7 100644
--- a/config.yaml
+++ b/config.yaml
@@ -65,7 +65,7 @@ episodes:
- 03-interacting-with-tests.Rmd
- 04-unit-tests-best-practices.Rmd
- 05-testing-exceptions.Rmd
-- 06-floating-point-data.Rmd
+- 06-testing-data-structures.Rmd
- 07-fixtures.Rmd
- 08-parametrization.Rmd
- 09-testing-output-files.Rmd
diff --git a/episodes/00-introduction.Rmd b/episodes/00-introduction.Rmd
index 7f502c1a..eae28034 100644
--- a/episodes/00-introduction.Rmd
+++ b/episodes/00-introduction.Rmd
@@ -22,7 +22,7 @@ exercises: 2
This course aims to equip researchers with the skills to write effective tests and ensure the quality and reliability of their research software. No prior testing experience is required! We'll guide you through the fundamentals of software testing using Python's Pytest framework, a powerful and beginner-friendly tool. You'll also learn how to integrate automated testing into your development workflow using continuous integration (CI). CI streamlines your process by automatically running tests with every code change, catching bugs early and saving you time. By the end of the course, you'll be able to write clear tests, leverage CI for efficient development, and ultimately strengthen the foundation of your scientific findings.
This course has a single continuous project that you will work on throughout the lessons and each lesson builds on the last through practicals that will help you apply the concepts you learn. However if you get stuck or fall behind during the course, don't worry!
-All the stages of the project for each lesson are available in the `learners/files` directory in this [course's materials](https://github.com/researchcodingclub/python-testing-for-research) that you can copy across if needed. For example if you are on lesson 3 and haven't completed the practicals for lesson 2, you can copy the corresponding folder from the `learners/files` directory.
+All the stages of the project for each lesson are available in the `files` directory in this course's materials that you can copy across if needed. For example if you are on lesson 3 and haven't completed the practicals for lesson 2, you can copy the corresponding folder from the `files` directory.
By the end of this course, you should:
@@ -72,9 +72,9 @@ This course uses blocks like the one below to indicate an exercise for you to at
::::::::::::::::::::::::::::::::::::: keypoints
-- This course will teach you how to write effective tests and ensure the quality and reliability of your research software.
-- No prior testing experience is required.
-- You can catch up on practicals by copying the corresponding folder from the `learners/files` directory of this [course's materials](https://github.com/researchcodingclub/python-testing-for-research).
+- This course will teach you how to write effective tests and ensure the quality and reliability of your research software
+- No prior testing experience is required
+- You can catch up on practicals by copying the corresponding folder from the `files` directory of this course's materials
::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/01-why-test-my-code.Rmd b/episodes/01-why-test-my-code.Rmd
index c0c9b40e..8ee4886c 100644
--- a/episodes/01-why-test-my-code.Rmd
+++ b/episodes/01-why-test-my-code.Rmd
@@ -18,22 +18,16 @@ exercises: 2
## What is software testing?
-Software testing is the process of checking that code is working as expected.
-You may have data processing functions or automations that you use in your work.
-How do you know that they are doing what you expect them to do?
+Software testing is the process of checking that code is working as expected. You may have data processing functions or automations that you use in your work - how do you know that they are doing what you expect them to do?
-Software testing is most commonly done by writing test code that check that
-your code works as expected.
+Software testing is most commonly done by writing code (tests) that check that your code works as expected.
-This might seem like a lot of effort, so let's go over some of the reasons you
-might want to add tests to your project.
+This might seem like a lot of effort, so let's go over some of the reasons you might want to add tests to your project.
## Catching bugs
-Whether you are writing the occasional script or developing a large software,
-mistakes are inevitable. Sometimes you don't even know when a mistake creeps
-into the code, and it gets published.
+Whether you are writing the occasional script or developing a large software, mistakes are inevitable. Sometimes you don't even know when a mistake creeps into the code, and it gets published.
Consider the following function:
@@ -42,63 +36,50 @@ def add(a, b):
return a - b
```
-When writing this function, I made a mistake. I accidentally wrote `a - b`
-instead of `a + b`. This is a simple mistake, but it could have serious
-consequences in a project.
+When writing this function, I made a mistake. I accidentally wrote `a - b` instead of `a + b`. This is a simple mistake, but it could have serious consequences in a project.
-When writing the code, I could have tested this function by manually trying it
-with different inputs and checking the output, but:
+When writing the code, I could have tested this function by manually trying it with different inputs and checking the output, but:
- This takes time.
- I might forget to test it again when we make changes to the code later on.
-- Nobody else in my team knows if I tested it, or how I tested it, and
- therefore whether they can trust it.
+- Nobody else in my team knows if I tested it, or how I tested it, and therefore whether they can trust it.
This is where automated testing comes in.
## Automated testing
-Automated testing is where we write code that checks that our code works as
-expected. Every time we make a change, we can run our tests to automatically
-make sure that our code still works as expected.
+Automated testing is where we write code that checks that our code works as expected. Every time we make a change, we can run our tests to automatically make sure that our code still works as expected.
-If we were writing a test from scratch for the `add` function, think for a
-moment on how we would do it.
-
-We would need to write a function that runs the `add` function on a set of
-inputs, checking each case to ensure it does what we expect. Let's write a test
-for the `add` function and call it `test_add`:
+If we were writing a test from scratch for the `add` function, think for a moment on how we would do it.
+We would need to write a function that runs the `add` function on a set of inputs, checking each case to ensure it does what we expect. Let's write a test for the `add` function and call it `test_add`:
```python
def test_add():
- # Check that it adds two positive integers
- if add(1, 2) != 3:
- print("Test failed!")
- # Check that it adds zero
- if add(5, 0) != 5:
- print("Test failed!")
- # Check that it adds two negative integers
- if add(-1, -2) != -3:
- print("Test failed!")
+ # Check that it adds two positive integers
+ if add(1, 2) != 3:
+ print("Test failed!")
+ # Check that it adds zero
+ if add(5, 0) != 5:
+ print("Test failed!")
+ # Check that it adds two negative integers
+ if add(-1, -2) != -3:
+ print("Test failed!")
```
-Here we check that the function works for a set of test cases. We ensure that
-it works for positive numbers, negative numbers, and zero.
+Here we check that the function works for a set of test cases. We ensure that it works for positive numbers, negative numbers, and zero.
::::::::::::::::::::::::::::::::::::: challenge
-## What could go wrong?
+## Challenge 1: What could go wrong?
-When writing functions, sometimes we don't anticipate all the ways that they
-could go wrong.
+When writing functions, sometimes we don't anticipate all the ways that they could go wrong.
-Take a moment to think about what is wrong, or might go wrong with these
-functions:
+Take a moment to think about what is wrong, or might go wrong with these functions:
```python
def greet_user(name):
- return "Hello" + name + "!"
+ return "Hello" + name + "!"
```
```python
@@ -108,40 +89,38 @@ def gradient(x1, y1, x2, y2):
:::::::::::::::::::::::: solution
-The first function will incorrectly greet the user, as it is missing a space
-after "Hello". It would print `HelloAlice!` instead of `Hello Alice!`.
-
-If we wrote a test for this function, we would have noticed that it was not
-working as expected:
+## Answer
+
+The first function will incorrectly greet the user, as it is missing a space after "Hello". It would print `HelloAlice!` instead of `Hello Alice!`.
+If we wrote a test for this function, we would have noticed that it was not working as expected:
```python
def test_greet_user():
- if greet_user("Alice") != "Hello Alice!":
- print("Test failed!")
+ if greet_user("Alice") != "Hello Alice!":
+ print("Test failed!")
```
The second function will crash if `x2 - x1` is zero.
-If we wrote a test for this function, it may have helped us to catch this
-unexpected behaviour:
+If we wrote a test for this function, it may have helped us to catch this unexpected behaviour:
```python
def test_gradient():
- if gradient(1, 1, 2, 2) != 1:
- print("Test failed!")
- if gradient(1, 1, 2, 3) != 2:
- print("Test failed!")
- if gradient(1, 1, 1, 2) != "Undefined":
- print("Test failed!")
+ if gradient(1, 1, 2, 2) != 1:
+ print("Test failed!")
+ if gradient(1, 1, 2, 3) != 2:
+ print("Test failed!")
+ if gradient(1, 1, 1, 2) != "Undefined":
+ print("Test failed!")
```
-And we could have amended the function:
+And we could have ammened the function:
```python
def gradient(x1, y1, x2, y2):
- if x2 - x1 == 0:
- return "Undefined"
- return (y2 - y1) / (x2 - x1)
+ if x2 - x1 == 0:
+ return "Undefined"
+ return (y2 - y1) / (x2 - x1)
```
:::::::::::::::::::::::::::::::::
@@ -150,72 +129,59 @@ def gradient(x1, y1, x2, y2):
## Finding the root cause of a bug
-When a test fails, it can help us to find the root cause of a bug. For example,
-consider the following function:
+When a test fails, it can help us to find the root cause of a bug. For example, consider the following function:
```python
def multiply(a, b):
- return a * a
+ return a * a
def divide(a, b):
- return a / b
+ return a / b
def triangle_area(base, height):
- return divide(multiply(base, height), 2)
+ return divide(multiply(base, height), 2)
```
-There is a bug in this code too, but since we have several functions calling
-each other, it is not immediately obvious where the bug is. Also, the bug is
-not likely to cause a crash, so we won't get a helpful error message telling us
-what went wrong. If a user happened to notice that there was an error, then we
-would have to check `triangle_area` to see if the formula we used is right,
-then `multiply`, and `divide` to see if they were working as expected too!
+There is a bug in this code too, but since we have several functions calling each other, it is not immediately obvious where the bug is. Also, the bug is not likely to cause a crash, so we won't get a helpful error message telling us what went wrong. If a user happened to notice that there was an error, then we would have to check `triangle_area` to see if the formula we used is right, then `multiply`, and `divide` to see if they were working as expected too!
-However, if we had written tests for these functions, then we would have seen
-that both the `triangle_area` and `multiply` functions were not working as
-expected, allowing us to quickly see that the bug was in the `multiply`
-function without having to check the other functions.
+However, if we had written tests for these functions, then we would have seen that both the `triangle_area` and `multiply` functions were not working as expected, allowing us to quickly see that the bug was in the `multiply` function without having to check the other functions.
## Increased confidence in code
-When you have tests for your code, you can be more confident that it works as
-expected. This is especially important when you are working in a team or
-producing software for users, as it allows everyone to trust the code. If you
-have a test that checks that a function works as expected, then you can be
-confident that the function will work as expected, even if you didn't write it
-yourself.
+When you have tests for your code, you can be more confident that it works as expected. This is especially important when you are working in a team or producing software for users, as it allows everyone to trust the code. If you have a test that checks that a function works as expected, then you can be confident that the function will work as expected, even if you didn't write it yourself.
## Forcing a more structured approach to coding
-When you write tests for your code, you are forced to think more carefully
-about how your code behaves and how you will verify that it works as expected.
-This can help you to write more structured code, as you will need to think
-about how to test it as well as how it could fail.
+When you write tests for your code, you are forced to think more carefully about how your code behaves and how you will verify that it works as expected. This can help you to write more structured code, as you will need to think about how to test it as well as how it could fail.
::::::::::::::::::::::::::::::::::::: challenge
-## What could go wrong?
+## Challenge 2: What could go wrong?
Consider a function that controls a driverless car.
- What checks might we add to make sure it is not dangerous to use?
```python
+
def drive_car(speed, direction):
- ... # complex car driving code
+ ... # complex car driving code
return speed, direction, brake_status
+
+
```
:::::::::::::::::::::::: solution
+## Answer
- We might want to check that the speed is within a safe range.
-- We might want to check that the direction is a valid direction. ie not
- towards a tree, and if so, the car should be applying the brakes.
+
+- We might want to check that the direction is a valid direction. ie not towards a tree, and if so, the car should be applying the brakes.
:::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/02-simple-tests.Rmd b/episodes/02-simple-tests.Rmd
index ead70409..1f7d0420 100644
--- a/episodes/02-simple-tests.Rmd
+++ b/episodes/02-simple-tests.Rmd
@@ -25,15 +25,12 @@ The most basic thing you will want to do in a test is check that an output for a
function is correct by checking that it is equal to a certain value.
Let's take the `add` function example from the previous chapter and the test we
-conceptualised for it and write it in code. We'll aim to write the test in such
-a way that it can be run using Pytest, the most commonly used testing framework
-in Python.
+conceptualised for it and write it in code.
-- Make a folder called `my_project` (or whatever you want to call it for these
- lessons) and inside it, create a file called 'calculator.py', and another
- file called 'test_calculator.py'.
+- Make a folder called `my_project` (or whatever you want to call it for these lessons)
+and inside it, create a file called 'calculator.py', and another file called 'test_calculator.py'.
-Your directory structure should look like this:
+So your directory structure should look like this:
```bash
project_directory/
@@ -42,88 +39,138 @@ project_directory/
└── test_calculator.py
```
-`calculator.py` will contain our Python functions that we want to test, and
-`test_calculator.py` will contain our tests for those functions.
+`calculator.py` will contain our Python functions that we want to test, and `test_calculator.py`
+will contain our tests for those functions.
- In `calculator.py`, write the add function:
```python
def add(a, b):
- return a + b
+ return a + b
```
-- And in `test_calculator.py`, write the test for the add function that we
- conceptualised in the previous lesson, but use the `assert` keyword in place
- of if statements and print functions:
+- And in `test_calculator.py`, write the test for the add function that we conceptualised
+in the previous lesson:
```python
# Import the add function so the test can use it
from calculator import add
def test_add():
- # Check that it adds two positive integers
- assert add(1, 2) == 3
-
- # Check that it adds zero
- assert add(5, 0) == 5
-
- # Check that it adds two negative integers
- assert add(-1, -2) == -3
+ # Check that it adds two positive integers
+ if add(1, 2) != 3:
+ print("Test failed!")
+ raise AssertionError("Test failed!")
+
+ # Check that it adds zero
+ if add(5, 0) != 5:
+ print("Test failed!")
+ raise AssertionError("Test failed!")
+
+ # Check that it adds two negative integers
+ if add(-1, -2) != -3:
+ print("Test failed!")
+ raise AssertionError("Test failed!")
```
-The `assert` statement will crash the test by raising an `AssertionError` if
-the condition following it is false. Pytest uses these to tell that the test
-has failed.
+(Note that the `AssertionError` is a way to tell Python to crash the test, so Pytest knows that the test has failed.)
This system of placing functions in a file and then tests for those functions in
-another file is a common pattern in software development. It allows you to keep your
+another file, is a common pattern in software development. It allows you to keep your
code organised and separate your tests from your actual code.
-With Pytest, the expectation is to name your test files and functions with the
-prefix `test_`. If you do so, Pytest will automatically find and execute each
-test function.
+With Pytest, the expectation is to name your test functions with the prefix `test_`.
Now, let's run the test. We can do this by running the following command in the terminal:
(make sure you're in the `my_project` directory before running this command)
```bash
-❯ pytest
+❯ pytest ./
+```
+
+This command tells pytest to run all the tests in the current directory.
+
+When you run the test, you should see that the test runs successfully, indicated
+by some **green**. text in the terminal. We will go through the output and what it means
+in the next lesson, but for now, know that **green** means that the test passed, and **red**
+means that the test failed.
+
+Try changing the `add` function to return the wrong value, and run the test again to see that the test now fails
+and the text turns **red** - neat!
+
+## The `assert` keyword
+
+Writing these `if` blocks for each test case is cumbersome. Fortunately, Python
+has a keyword to do this for us - the `assert` keyword.
+
+The `assert` keyword checks if a statement is true and if it is, the test continues, but
+if it isn't, then the test will crash, printing an error in the terminal. This enables us
+to write succinct tests without lots of if-statements.
+
+The `assert` keyword is used like this:
+
+```python
+assert add(1, 2) == 3
```
-This command tells Pytest to run all the tests in the current directory.
+which is equivalent to:
-When you run the test, you should see that the test runs successfully,
-indicated by some **green**. text in the
-terminal. We will go through the output and what it means in the next lesson,
-but for now, know that **green** means that
-the test passed, and **red** means that the test
-failed.
+```python
+if add(1, 2) != 3:
+ # Crash the test
+ raise AssertionError
+```
-Try changing the `add` function to return the wrong value, and run the test
-again to see that the test now fails and the text turns **red** - neat! If this was a real testing situation,
-we would know to investigate the `add` function to see why it's not behaving as
-expected.
+::::::::::::::::::::::::::::::::::::: challenge
+## Challenge 1: Use the assert keyword to update the test for the add function
+
+Use the `assert` keyword to update the test for the `add` function to make it more concise and readable.
+
+Then re-run the test using `pytest ./` to check that it still passes.
+
+:::::::::::::::::::::::: solution
+
+```python
+from calculator import add
+
+def test_add():
+ assert add(1, 2) == 3 # Check that it adds to positive integers
+ assert add(5, 0) == 5 # Check that it adds zero
+ assert add(-1, -2) == -3 # Check that it adds wro negative numbers
+```
+
+:::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+Now that we are using the `assert` keyword, pytest will let us know if our test fails.
+
+What's more, is that if any of these assert statements fail, it will flag to
+pytest that the test has failed, and pytest will let you know.
+
+
+Make the `add` function return the wrong value, and run the test again to see that the test
+fails and the text turns **red** as we expect.
+
+
+So if this was a real testing situation, we would know to investigate the `add` function to see why it's not behaving as expected.
::::::::::::::::::::::::::::::::::::: challenge
-## Write a test for a multiply function
+## Challenge 2: Write a test for a multiply function
-Try using what we have covered to write a test for a `multiply` function that
-multiplies two numbers together.
+Try using what we have covered to write a test for a `multiply` function that multiplies two numbers together.
- Place this multiply function in `calculator.py`:
```python
def multiply(a, b):
- return a * b
+ return a * b
```
-- Then write a test for this function in `test_calculator.py`. Remember to
- import the `multiply` function from `calculator.py` at the top of the file
- like this:
+- Then write a test for this function in `test_calculator.py`. Remember to import the `multiply` function from `calculator.py` at the top of the file like this:
```python
from calculator import multiply
@@ -131,29 +178,30 @@ from calculator import multiply
:::::::::::::::::::::::: solution
-There are many different test cases that you could include, but it's important
-to check that different types of cases are covered. A test for this function
-could look like this:
+## Solution:
+There are many different test cases that you could include, but it's important to check that different types of cases are covered. A test for this function could look like this:
```python
def test_multiply():
- # Check that positive numbers work
- assert multiply(5, 5) == 25
- # Check that multiplying by 1 works
- assert multiply(1, 5) == 5
- # Check that multiplying by 0 works
- assert multiply(0, 3) == 0
- # Check that negative numbers work
- assert multiply(-5, 2) == -10
+ # Check that positive numbers work
+ assert multiply(5, 5) == 25
+ # Check that multiplying by 1 works
+ assert multiply(1, 5) == 5
+ # Check that multiplying by 0 works
+ assert multiply(0, 3) == 0
+ # Check that negative numbers work
+ assert multiply(-5, 2) == -10
```
:::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::::::::::::::
+Run the test using `pytest ./` to check that it passes. If it doesn't, don't worry, that's the point of testing - to find bugs in code.
+
::::::::::::::::::::::::::::::::::::: keypoints
-- The `assert` keyword is used to check if a statement is true.
+- The `assert` keyword is used to check if a statement is true and is a shorthand for writing `if` statements in tests.
- Pytest is invoked by running the command `pytest ./` in the terminal.
- `pytest` will run all the tests in the current directory, found by looking for files that start with `test_`.
- The output of a test is displayed in the terminal, with **green** text indicating a successful test and **red** text indicating a failed test.
diff --git a/episodes/03-interacting-with-tests.Rmd b/episodes/03-interacting-with-tests.Rmd
index cc572770..c91b5668 100644
--- a/episodes/03-interacting-with-tests.Rmd
+++ b/episodes/03-interacting-with-tests.Rmd
@@ -4,7 +4,7 @@ teaching: 10
exercises: 2
---
-:::::::::::::::::::::::::::::::::::::: questions
+:::::::::::::::::::::::::::::::::::::: questions
- How do I use pytest to run my tests?
- What does the output of pytest look like and how do I interpret it?
@@ -84,19 +84,17 @@ Let's break down the successful output in more detail.
```
- The first line tells us that pytest has started running tests.
```
-platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
-
+platform darwin -- Python 3.11.0, pytest-8.1.1, pluggy-1.4.0
```
- The next line just tells us the versions of several packages.
```
-rootdir: /home//.../python-testing-for-research/learners/files/03-interacting-with-tests
-
+rootdir: /Users/sylvi/Documents/GitKraken/python-testing-for-research/episodes/files/03-interacting-with-tests
```
- The next line tells us where the tests are being searched for. In this case, it is your project directory. So any file that starts or ends with `test` anywhere in this directory will be opened and searched for test functions.
```
-plugins: snaptol-0.0.2
+plugins: regtest-2.1.1
```
-- This tells us what plugins are being used. In my case, I have a plugin called `snaptol` that is being used, but you may not. This is fine and you can ignore it.
+- This tells us what plugins are being used. In my case, I have a plugin called `regtest` that is being used, but you may not. This is fine and you can ignore it.
```
collected 3 items
@@ -104,7 +102,7 @@ collected 3 items
- This simply tells us that 3 tests have been found and are ready to be run.
```
-advanced/test_advanced_calculator.py .
+advanced/test_advanced_calculator.py .
test_calculator.py .. [100%]
```
- These two lines tells us that the tests in `test_calculator.py` and `advanced/test_advanced_calculator.py` have passed. Each `.` means that a test has passed. There are two of them beside `test_calculator.py` because there are two tests in `test_calculator.py` If a test fails, it will show an `F` instead of a `.`.
@@ -117,15 +115,15 @@ test_calculator.py .. [100%]
- This tells us that the 3 tests have passed in 0.01 seconds.
### Case 2: Some or all tests fail
-Now let's look at the output when the tests fail. Edit a test in `test_calculator.py` to make it fail (for example switching a positive number to a negative number), then run `pytest` again.
+Now let's look at the output when the tests fail. Edit a test in `test_calculator.py` to make it fail (for example switching the `+` in `add` to a `-`), then run `pytest` again.
The start is much the same as before:
```
=== test session starts ===
-platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
-rootdir: /home//.../python-testing-for-research/learners/files/03-interacting-with-tests
-plugins: snaptol-0.0.2
+platform darwin -- Python 3.11.0, pytest-8.1.1, pluggy-1.4.0
+rootdir: /Users/sylvi/Documents/GitKraken/python-testing-for-research/episodes/files/03-interacting-with-tests
+plugins: regtest-2.1.1
collected 3 items
```
@@ -133,7 +131,7 @@ But now we see that the tests have failed:
```
advanced/test_advanced_calculator.py . [ 33%]
-test_calculator.py F.
+test_calculator.py F.
```
These `F` tells us that a test has failed. The output then tells us which test has failed:
@@ -144,26 +142,26 @@ These `F` tells us that a test has failed. The output then tells us which test h
___ test_add ___
def test_add():
"""Test for the add function"""
-> assert add(1, 2) == -3
-E assert 3 == -3
-E + where 3 = add(1, 2)
+> assert add(1, 2) == 3
+E assert -1 == 3
+E + where -1 = add(1, 2)
-test_calculator.py:7: AssertionError
+test_calculator.py:21: AssertionError
```
This is where we get detailled information about what exactly broke in the test.
- The `>` chevron points to the line that failed in the test. In this case, the assertion `assert add(1, 2) == 3` failed.
-- The following line tells us what the assertion tried to do. In this case, it tried to assert that the number 3 was equal to -3. Which of course it isn't.
-- The next line goes into more detail about why it tried to equate 3 to -3. It tells us that 3 is the result of calling `add(1, 2)`.
-- The final line tells us where the test failed. In this case, it was on line 7 of `test_calculator.py`.
+- The following line tells us what the assertion tried to do. In this case, it tried to assert that the number -1 was equal to 3. Which of course it isn't.
+- The next line goes into more detail about why it tried to equate -1 to 3. It tells us that -1 is the result of calling `add(1, 2)`.
+- The final line tells us where the test failed. In this case, it was on line 21 of `test_calculator.py`.
Using this detailled output, we can quickly find the exact line that failed and know the inputs that caused the failure. From there, we can examine exactly what went wrong and fix it.
Finally, pytest prints out a short summary of all the failed tests:
```
=== short test summary info ===
-FAILED test_calculator.py::test_add - assert 3 == -3
+FAILED test_calculator.py::test_add - assert -1 == 3
=== 1 failed, 2 passed in 0.01s ===
```
@@ -179,14 +177,17 @@ For example, if you remove the `:` from the end of the `def test_multiply():` fu
```
=== test session starts ===
-platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0
-rootdir: /home//.../python-testing-for-research/learners/files/03-interacting-with-tests
-plugins: snaptol-0.0.2
-collected 1 item / 1 error
+platform darwin -- Python 3.11.0, pytest-8.1.1, pluggy-1.4.0
+Matplotlib: 3.9.0
+Freetype: 2.6.1
+rootdir: /Users/sylvi/Documents/GitKraken/python-testing-for-research/episodes/files/03-interacting-with-tests.Rmd
+plugins: mpl-0.17.0, regtest-2.1.1
+collected 1 item / 1 error
+
=== ERRORS ===
___ ERROR collecting test_calculator.py ___
...
-E File "/home//.../python-testing-for-research/learners/files/03-interacting-with-tests/test_calculator.py", line 14
+E File "/Users/sylvi/Documents/GitKraken/python-testing-for-research/episodes/files/03-interacting-with-tests.Rmd/test_calculator.py", line 14
E def test_multiply()
E ^
E SyntaxError: expected ':'
@@ -220,10 +221,6 @@ Alternatively you can call a specific test using this notation: `pytest test_cal
If you want to stop running tests after the first failure, you can use the `-x` flag. This will cause pytest to stop running tests after the first failure. This is useful when you have lots of tests that take a while to run.
-### Running tests that previously failed
-
-If you don't want to rerun your entire test suite after a single test failure, the `--lf` flag will run only the 'last failed' tests. Alternatively, `--ff` will run the tests that failed first.
-
::::::::::::::::::::::::::::::::::::: challenge
## Challenge - Experiment with pytest options
@@ -246,12 +243,12 @@ Try running pytest with the above options, editing the code to make the tests fa
:::::::::::::::::::::::::::::::::::::::::
-::::::::::::::::::::::::::::::::::::: keypoints
+::::::::::::::::::::::::::::::::::::: keypoints
- You can run multiple tests at once by running `pytest` in the terminal.
- Pytest searches for tests in files that start or end with 'test' in the current directory and subdirectories.
- The output of pytest tells you which tests have passed and which have failed and precisely why they failed.
-- Pytest accepts many additional flags to change which tests are run, give more detailed output, etc.
+- Flags such as `-v`, `-q`, `-k`, and `-x` can be used to get more detailed output, less detailed output, run specific tests, and stop running tests after the first failure, respectively.
::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/04-unit-tests-best-practices.Rmd b/episodes/04-unit-tests-best-practices.Rmd
index 8733342a..1cbe4af3 100644
--- a/episodes/04-unit-tests-best-practices.Rmd
+++ b/episodes/04-unit-tests-best-practices.Rmd
@@ -4,7 +4,7 @@ teaching: 10
exercises: 2
---
-:::::::::::::::::::::::::::::::::::::: questions
+:::::::::::::::::::::::::::::::::::::: questions
- What to do about complex functions & tests?
- What are some testing best practices for testing?
@@ -40,7 +40,7 @@ def process_data(data: list, maximum_value: float):
for i in range(len(data_negative_removed)):
if data_negative_removed[i] <= maximum_value:
data_maximum_removed.append(data_negative_removed[i])
-
+
# Calculate the mean
mean = sum(data_maximum_removed) / len(data_maximum_removed)
@@ -63,17 +63,9 @@ def test_process_data():
```
-This test is hard to debug if it fails. Imagine if the calculation of the mean broke - the test would fail but it would not tell us what part of the function was broken, requiring us to
+This test is very complex and hard to debug if it fails. Imagine if the calculation of the mean broke - the test would fail but it would not tell us what part of the function was broken, requiring us to
check each function manually to find the bug. Not very efficient!
-:::::::::::::::::::::::::::: callout
-
-Asserting that the standard deviation is equal to 16 decimal
-places is also quite error prone. We'll see in a later lesson
-how to improve this test.
-
-::::::::::::::::::::::::::::::::::::
-
## Unit Testing
The process of unit testing is a fundamental part of software development. It is where you test individual units or components of a software instead of multiple things at once.
@@ -164,10 +156,10 @@ This makes your tests easier to read and understand for both yourself and others
def test_calculate_mean():
# Arrange
data = [1, 2, 3, 4, 5]
-
+
# Act
mean = calculate_mean(data)
-
+
# Assert
assert mean == 3
```
@@ -198,10 +190,10 @@ Here is an example of the TDD process:
def test_calculate_mean():
# Arrange
data = [1, 2, 3, 4, 5]
-
+
# Act
mean = calculate_mean(data)
-
+
# Assert
assert mean == 3.5
```
@@ -252,7 +244,7 @@ Random seeds work by setting the initial state of the random number generator.
This means that if you set the seed to the same value, you will get the same sequence of random numbers each time you run the function.
-::::::::::::::::::::::::::::::::::::: challenge
+::::::::::::::::::::::::::::::::::::: challenge
## Challenge: Write your own unit tests
@@ -266,21 +258,21 @@ Take this complex function, break it down and write unit tests for it.
import random
def randomly_sample_and_filter_participants(
- participants: list,
- sample_size: int,
- min_age: int,
- max_age: int,
- min_height: int,
+ participants: list,
+ sample_size: int,
+ min_age: int,
+ max_age: int,
+ min_height: int,
max_height: int
):
- """Participants is a list of dicts, containing the age and height of each participant
+ """Participants is a list of tuples, containing the age and height of each participant
participants = [
- {age: 25, height: 180},
- {age: 30, height: 170},
- {age: 35, height: 160},
+ {age: 25, height: 180},
+ {age: 30, height: 170},
+ {age: 35, height: 160},
]
"""
-
+
# Get the indexes to sample
indexes = random.sample(range(len(participants)), sample_size)
@@ -288,13 +280,13 @@ def randomly_sample_and_filter_participants(
sampled_participants = []
for i in indexes:
sampled_participants.append(participants[i])
-
+
# Remove participants that are outside the age range
sampled_participants_age_filtered = []
for participant in sampled_participants:
if participant['age'] >= min_age and participant['age'] <= max_age:
sampled_participants_age_filtered.append(participant)
-
+
# Remove participants that are outside the height range
sampled_participants_height_filtered = []
for participant in sampled_participants_age_filtered:
@@ -307,7 +299,7 @@ def randomly_sample_and_filter_participants(
- Create a new file called `test_stats.py` in the `statistics` directory
- Write unit tests for the `randomly_sample_and_filter_participants` function in `test_stats.py`
-:::::::::::::::::::::::: solution
+:::::::::::::::::::::::: solution
The function can be broken down into smaller functions, each of which can be tested separately:
@@ -315,7 +307,7 @@ The function can be broken down into smaller functions, each of which can be tes
import random
def sample_participants(
- participants: list,
+ participants: list,
sample_size: int
):
indexes = random.sample(range(len(participants)), sample_size)
@@ -325,8 +317,8 @@ def sample_participants(
return sampled_participants
def filter_participants_by_age(
- participants: list,
- min_age: int,
+ participants: list,
+ min_age: int,
max_age: int
):
filtered_participants = []
@@ -336,8 +328,8 @@ def filter_participants_by_age(
return filtered_participants
def filter_participants_by_height(
- participants: list,
- min_height: int,
+ participants: list,
+ min_height: int,
max_height: int
):
filtered_participants = []
@@ -347,11 +339,11 @@ def filter_participants_by_height(
return filtered_participants
def randomly_sample_and_filter_participants(
- participants: list,
- sample_size: int,
- min_age: int,
- max_age: int,
- min_height: int,
+ participants: list,
+ sample_size: int,
+ min_age: int,
+ max_age: int,
+ min_height: int,
max_height: int
):
sampled_participants = sample_participants(participants, sample_size)
@@ -455,7 +447,7 @@ When time is limited, it's often better to only write tests for the most critica
You should discuss with your team how much of the code you think should be tested, and what the most critical parts of the code are in order to prioritize your time.
-::::::::::::::::::::::::::::::::::::: keypoints
+::::::::::::::::::::::::::::::::::::: keypoints
- Complex functions can be broken down into smaller, testable units.
- Testing each unit separately is called unit testing.
diff --git a/episodes/05-testing-exceptions.Rmd b/episodes/05-testing-exceptions.Rmd
index 07574b80..88c1a673 100644
--- a/episodes/05-testing-exceptions.Rmd
+++ b/episodes/05-testing-exceptions.Rmd
@@ -25,9 +25,10 @@ Take this example of the `square_root` function. We don't have time to implement
```python
def square_root(x):
- if x < 0:
- raise ValueError("Cannot compute square root of negative number yet!")
- return x ** 0.5
+ if x < 0:
+ raise ValueError("Cannot compute square root of negative number yet!")
+ return x ** 0.5
+
```
We can test that the function raises an exception using `pytest.raises` as follows:
@@ -51,6 +52,7 @@ def test_square_root():
with pytest.raises(ValueError) as e:
square_root(-1)
assert str(e.value) == "Cannot compute square root of negative number yet!"
+
```
::::::::::::::::::::::::::::::::::::: challenge
diff --git a/episodes/06-floating-point-data.Rmd b/episodes/06-floating-point-data.Rmd
deleted file mode 100644
index 3e56e18c..00000000
--- a/episodes/06-floating-point-data.Rmd
+++ /dev/null
@@ -1,282 +0,0 @@
----
-title: 'Floating Point Data'
-teaching: 10
-exercises: 5
----
-
-:::::::::::::::::::::::::::::::::::::: questions
-
-- What are the best practices when working with floating point data?
-- How do you compare objects in libraries like `numpy`?
-
-::::::::::::::::::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::: objectives
-
-- Learn how to test floating point data with tolerances.
-- Learn how to compare objects in libraries like `numpy`.
-
-::::::::::::::::::::::::::::::::::::::::::::::::
-
-## Floating Point Data
-
-Real numbers are encountered very frequently in research, but it's quite likely
-that they won't be 'nice' numbers like 2.0 or 0.0. Instead, the outcome of our
-code might be something like `2.34958124890e-31`, and we may only be confident
-in that answer to a certain precision.
-
-Computers typically represent real numbers using a 'floating point' representation,
-which truncates their precision to a certain number of decimal places. Floating point
-arithmetic errors can cause a significant amount of noise in the last few decimal
-places. This can be affected by:
-
-- Choice of algorithm.
-- Precise order of operations.
-- Order in which parallel processes finish.
-- Inherent randomness in the calculation.
-
-We could therefore test our code using `assert result == 2.34958124890e-31`,
-but it's possible that this test could erroneously fail in future for reasons
-outside our control. This lesson will teach best practices for handling this
-type of data.
-
-Libraries like NumPy, SciPy, and Pandas are commonly used to interact
-with large quantities of floating point numbers. NumPy provides special
-functions to assist with testing.
-
-### Relative and Absolute Tolerances
-
-Rather than testing that a floating point number is exactly equal to another,
-it is preferable to test that it is within a certain tolerance. In most cases,
-it is best to use a _relative_ tolerance:
-
-```python
-from math import fabs
-
-def test_float_rtol():
- actual = my_function()
- expected = 7.31926e12 # Reference solution
- rtol = 1e-3
- # Use fabs to ensure a positive result!
- assert fabs((actual - expected) / expected) < rtol
-```
-
-In some situations, such as testing a number is close to zero without caring
-about exactly how large it is, it is preferable to test within an _absolute_
-tolerance:
-
-```python
-from math import fabs
-
-def test_float_atol():
- actual = my_function()
- expected = 0.0 # Reference solution
- atol = 1e-5
- # Use fabs to ensure a positive result!
- assert fabs(actual - expected) < atol
-```
-
-
-Let's practice with a function that estimates the value of pi (very
-inefficiently!).
-
-::::::::::::::::::::::::::::::::::::: challenge
-
-## Testing with tolerances
-
-- Write this function to a file `estimate_pi.py`:
-
-```python
-import random
-
-def estimate_pi(iterations):
- num_inside = 0
- for _ in range(iterations):
- x = random.random()
- y = random.random()
- if x**2 + y**2 < 1:
- num_inside += 1
- return 4 * num_inside / iterations
-```
-
-- Add a file `test_estimate_pi.py`, and include a test for this function using
- both absolute and relative tolerances.
-- Find an appropriate number of iterations so that the test finishes quickly,
- but keep in mind that both `atol` and `rtol` will need to be modified accordingly!
-
-:::::::::::::::::::::::: solution
-
-```python
-import random
-from math import fabs
-
-from estimate_pi import estimate_pi
-
-def test_estimate_pi():
- random.seed(0)
- expected = 3.141592654
- actual = estimate_pi(iterations=10000)
- # Test absolute tolerance
- atol = 1e-2
- assert fabs(actual - expected) < atol
- # Test relative tolerance
- rtol = 5e-3
- assert fabs((actual - expected) / expected) < rtol
-```
-
-In this case the absolute and relative tolerances should be similar, as
-the expected result is close in magnitude to 1.0, but in principle they could
-be very different!
-
-:::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::::::::::::::::::::
-
-The built-in function `math.isclose` can be used to simplify these checks:
-
-```python
-assert math.isclose(a, b, rel_tol=rtol, abs_tol=atol)
-```
-
-Both `rel_tol` and `abs_tol` may be provided, and it will return `True`
-if either of the conditions are satisfied.
-
-::::::::::::::::::::::::::::::::::::: challenge
-
-## Using `math.isclose`
-
-- Adapt the test you wrote in the previous challenge to make use of
- the `math.isclose` function.
-
-:::::::::::::::::::::::: solution
-
-```python
-import math
-import random
-
-from estimate_pi import estimate_pi
-
-def test_estimate_pi():
- random.seed(0)
- expected = 3.141592654
- actual = estimate_pi(iterations=10000)
- atol = 1e-2
- rtol = 5e-3
- assert math.isclose(actual, expected, abs_tol=atol, rel_tol=rtol)
-```
-
-:::::::::::::::::::::::::::::::::
-
-:::::::::::::::::::::::::::::::::::::::::::::::
-
-### NumPy
-
-NumPy is a common library used in research. Instead of the usual `assert a ==
-b`, NumPy has its own testing functions that are more suitable for comparing
-NumPy arrays. These functions are the ones you are most likely to use:
-
-- `numpy.testing.assert_array_equal` is used to compare two NumPy arrays for
- equality -- best used for integer data.
-- `numpy.testing.assert_allclose` is used to compare two NumPy arrays with a
- tolerance for floating point numbers.
-
-Here are some examples of how to use these functions:
-
-```python
-
-def test_numpy_arrays():
- """Test that numpy arrays are equal"""
- # Create two numpy arrays
- array1 = np.array([1, 2, 3])
- array2 = np.array([1, 2, 3])
- # Check that the arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-# Note that np.testing.assert_array_equal even works with multidimensional numpy arrays!
-
-def test_2d_numpy_arrays():
- """Test that 2d numpy arrays are equal"""
- # Create two 2d numpy arrays
- array1 = np.array([[1, 2], [3, 4]])
- array2 = np.array([[1, 2], [3, 4]])
- # Check that the nested arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-def test_numpy_arrays_with_tolerance():
- """Test that numpy arrays are equal with tolerance"""
- # Create two numpy arrays
- array1 = np.array([1.0, 2.0, 3.0])
- array2 = np.array([1.00009, 2.0005, 3.0001])
- # Check that the arrays are equal with tolerance
- np.testing.assert_allclose(array1, array2, atol=1e-3)
-```
-
-The NumPy testing functions can be used on anything NumPy considers to be 'array-like'.
-This includes lists, tuples, and even individual floating point numbers if you choose.
-They can also be used for other objects in the scientific Python ecosystem, such
-as Pandas Series/DataFrames.
-
-:::::::::::::::::::::::: callout
-
-The Pandas library also provides its own testing functions:
-
-- `pandas.testing.assert_frame_equal`
-- `pandas.testing.assert_series_equal`
-
-These functions can also take `rtol` and `atol` arguments, so can fulfill the
-role of both `numpy.testing.assert_array_equal` and
-`numpy.testing.assert_allclose`.
-
-::::::::::::::::::::::::::::::::
-
-
-::::::::::::::::::::::::::::::::::::: challenge
-
-### Checking if NumPy arrays are equal
-
-In `statistics/stats.py` add this function to calculate the cumulative sum of a NumPy array:
-
-```python
-import numpy as np
-
-def calculate_cumulative_sum(array: np.ndarray) -> np.ndarray:
- """Calculate the cumulative sum of a numpy array"""
-
- # don't use the built-in numpy function
- result = np.zeros(array.shape)
- result[0] = array[0]
- for i in range(1, len(array)):
- result[i] = result[i-1] + array[i]
-
- return result
-```
-
-Then write a test for this function by comparing NumPy arrays.
-
-:::::::::::::::::::::::: solution
-
-```python
-import numpy as np
-from stats import calculate_cumulative_sum
-
-def test_calculate_cumulative_sum():
- """Test calculate_cumulative_sum function"""
- array = np.array([1, 2, 3, 4, 5])
- expected_result = np.array([1, 3, 6, 10, 15])
- np.testing.assert_array_equal(calculate_cumulative_sum(array), expected_result)
-```
-
-:::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::::::::::
-
-
-::::::::::::::::::::::::::::::::::::: keypoints
-
-- When comparing floating point data, you should use relative/absolute
- tolerances instead of testing for equality.
-- Numpy arrays cannot be compared using the `==` operator. Instead, use
- `numpy.testing.assert_array_equal` and `numpy.testing.assert_allclose`.
-
-::::::::::::::::::::::::::::::::::::::::::::::::
-
diff --git a/episodes/06-testing-data-structures.Rmd b/episodes/06-testing-data-structures.Rmd
new file mode 100644
index 00000000..8b82784a
--- /dev/null
+++ b/episodes/06-testing-data-structures.Rmd
@@ -0,0 +1,471 @@
+---
+title: 'Testing Data Structures'
+teaching: 10
+exercises: 2
+---
+
+:::::::::::::::::::::::::::::::::::::: questions
+
+- How do you compare data structures such as lists and dictionaries?
+- How do you compare objects in libraries like `pandas` and `numpy`?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Learn how to compare lists and dictionaries in Python.
+- Learn how to compare objects in libraries like `pandas` and `numpy`.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Data structures
+
+When writing tests for your code, you often need to compare data structures such as lists, dictionaries, and objects from libraries like `numpy` and `pandas`.
+Here we will go over some of the more common data structures that you may use in research and how to test them.
+
+### Lists
+
+Python lists can be tested using the usual `==` operator as we do for numbers.
+
+```python
+
+def test_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert list1 == list2
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert list3 != list4
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert list5 != list6
+
+```
+
+Note that the order of elements in the list matters. If you want to check that two lists contain the same elements but in different order, you can use the `sorted` function.
+
+```python
+def test_sorted_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert sorted(list1) == sorted(list2)
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert sorted(list3) == sorted(list4)
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert sorted(list5) != sorted(list6)
+
+```
+
+### Dictionaries
+
+Python dictionaries can also be tested using the `==` operator, however, the order of the keys does not matter.
+This means that if you have two dictionaries with the same keys and values, but in different order, they will still be considered equal.
+
+The reason for this is that dictionaries are unordered collections of key-value pairs.
+(If you need to preserve the order of keys, you can use the `collections.OrderedDict` class.)
+
+```python
+def test_dictionaries_equal():
+ """Test that dictionaries are equal"""
+ # Create two dictionaries
+ dict1 = {"a": 1, "b": 2, "c": 3}
+ dict2 = {"a": 1, "b": 2, "c": 3}
+ # Check that the dictionaries are equal
+ assert dict1 == dict2
+
+ # Create two dictionaries, different order
+ dict3 = {"a": 1, "b": 2, "c": 3}
+ dict4 = {"c": 3, "b": 2, "a": 1}
+ assert dict3 == dict4
+
+ # Create two different dictionaries
+ dict5 = {"a": 1, "b": 2, "c": 3}
+ dict6 = {"a": 1, "b": 2, "c": 4}
+ # Check that the dictionaries are not equal
+ assert dict5 != dict6
+```
+
+### numpy
+
+Numpy is a common library used in research.
+Instead of the usual `assert a == b`, numpy has its own testing functions that are more suitable for comparing numpy arrays.
+These two functions are the ones you are most likely to use:
+- `numpy.testing.assert_array_equal` is used to compare two numpy arrays.
+- `numpy.testing.assert_allclose` is used to compare two numpy arrays with a tolerance for floating point numbers.
+- `numpy.testing.assert_equal` is used to compare two objects such as lists or dictionaries that contain numpy arrays.
+
+Here are some examples of how to use these functions:
+
+```python
+
+def test_numpy_arrays():
+ """Test that numpy arrays are equal"""
+ # Create two numpy arrays
+ array1 = np.array([1, 2, 3])
+ array2 = np.array([1, 2, 3])
+ # Check that the arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+# Note that np.testing.assert_array_equal even works with nested numpy arrays!
+
+def test_nested_numpy_arrays():
+ """Test that nested numpy arrays are equal"""
+ # Create two nested numpy arrays
+ array1 = np.array([[1, 2], [3, 4]])
+ array2 = np.array([[1, 2], [3, 4]])
+ # Check that the nested arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+def test_numpy_arrays_with_tolerance():
+ """Test that numpy arrays are equal with tolerance"""
+ # Create two numpy arrays
+ array1 = np.array([1.0, 2.0, 3.0])
+ array2 = np.array([1.00009, 2.0005, 3.0001])
+ # Check that the arrays are equal with tolerance
+ np.testing.assert_allclose(array1, array2, atol=1e-3)
+```
+
+::::::::::::::::::::::::::::::::::::: callout
+
+### Data structures with numpy arrays
+
+When you have data structures that contain numpy arrays, such as lists or dictionaries, you cannot use `==` to compare them.
+Instead, you can use `numpy.testing.assert_equal` to compare the data structures.
+
+```python
+def test_dictionaries_with_numpy_arrays():
+ """Test that dictionaries with numpy arrays are equal"""
+ # Create two dictionaries with numpy arrays
+ dict1 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict2 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ # Check that the dictionaries are equal
+ np.testing.assert_equal(dict1, dict2)
+```
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+### pandas
+
+Pandas is another common library used in research for storing and manipulating datasets.
+Pandas has its own testing functions that are more suitable for comparing pandas objects.
+These two functions are the ones you are most likely to use:
+- `pandas.testing.assert_frame_equal` is used to compare two pandas DataFrames.
+- `pandas.testing.assert_series_equal` is used to compare two pandas Series.
+
+
+Here are some examples of how to use these functions:
+
+```python
+
+def test_pandas_dataframes():
+ """Test that pandas DataFrames are equal"""
+ # Create two pandas DataFrames
+ df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ df2 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ # Check that the DataFrames are equal
+ pd.testing.assert_frame_equal(df1, df2)
+
+def test_pandas_series():
+ """Test that pandas Series are equal"""
+ # Create two pandas Series
+ s1 = pd.Series([1, 2, 3])
+ s2 = pd.Series([1, 2, 3])
+ # Check that the Series are equal
+ pd.testing.assert_series_equal(s1, s2)
+```
+
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Challenge : Comparing Data Structures
+
+### Checking if lists are equal
+
+In `statistics/stats.py` add this function to remove anomalies from a list:
+
+```python
+def remove_anomalies(data: list, maximum_value: float, minimum_value: float) -> list:
+ """Remove anomalies from a list of numbers"""
+
+ result = []
+
+ for i in data:
+ if i <= maximum_value and i >= minimum_value:
+ result.append(i)
+
+ return result
+```
+
+Then write a test for this function by comparing lists.
+
+:::::::::::::::::::::::: solution
+
+```python
+from stats import remove_anomalies
+
+def test_remove_anomalies():
+ """Test remove_anomalies function"""
+ data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ maximum_value = 5
+ minimum_value = 2
+ expected_result = [2, 3, 4, 5]
+ assert remove_anomalies(data, maximum_value, minimum_value) == expected_result
+```
+
+:::::::::::::::::::::::::::::::::
+
+### Checking if dictionaries are equal
+
+In `statistics/stats.py` add this function to calculate the frequency of each element in a list:
+
+```python
+def calculate_frequency(data: list) -> dict:
+ """Calculate the frequency of each element in a list"""
+
+ frequencies = {}
+
+ # Iterate over each value in the list
+ for value in data:
+ # If the value is already in the dictionary, increment the count
+ if value in frequencies:
+ frequencies[value] += 1
+ # Otherwise, add the value to the dictionary with a count of 1
+ else:
+ frequencies[value] = 1
+
+ return frequencies
+```
+
+Then write a test for this function by comparing dictionaries.
+
+:::::::::::::::::::::::: solution
+
+```python
+from stats import calculate_frequency
+
+def test_calculate_frequency():
+ """Test calculate_frequency function"""
+ data = [1, 2, 3, 1, 2, 1, 1, 3, 3, 3]
+ expected_result = {1: 4, 2: 2, 3: 4}
+ assert calculate_frequency(data) == expected_result
+```
+
+:::::::::::::::::::::::::::::::::
+
+### Checking if numpy arrays are equal
+
+In `statistics/stats.py` add this function to calculate the cumulative sum of a numpy array:
+
+```python
+import numpy as np
+
+def calculate_cumulative_sum(array: np.ndarray) -> np.ndarray:
+ """Calculate the cumulative sum of a numpy array"""
+
+ # don't use the built-in numpy function
+ result = np.zeros(array.shape)
+ result[0] = array[0]
+ for i in range(1, len(array)):
+ result[i] = result[i-1] + array[i]
+
+ return result
+```
+
+Then write a test for this function by comparing numpy arrays.
+
+:::::::::::::::::::::::: solution
+
+```python
+import numpy as np
+from stats import calculate_cumulative_sum
+
+def test_calculate_cumulative_sum():
+ """Test calculate_cumulative_sum function"""
+ array = np.array([1, 2, 3, 4, 5])
+ expected_result = np.array([1, 3, 6, 10, 15])
+ np.testing.assert_array_equal(calculate_cumulative_sum(array), expected_result)
+```
+
+:::::::::::::::::::::::::::::::::
+
+### Checking if data structures with numpy arrays are equal
+
+In `statistics/stats.py` add this function to calculate the total score of each player in a dictionary:
+
+```python
+
+def calculate_player_total_scores(participants: dict):
+ """Calculate the total score of each player in a dictionary.
+
+ Example input:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3])
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6])
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9])
+ },
+ }
+
+ Example output:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3]),
+ "total_score": 6
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6]),
+ "total_score": 15
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9]),
+ "total_score": 24
+ },
+ }
+ """"
+
+ for player in participants:
+ participants[player]["total_score"] = np.sum(participants[player]["scores"])
+
+ return participants
+```
+
+Then write a test for this function by comparing dictionaries with numpy arrays.
+
+:::::::::::::::::::::::: solution
+
+```python
+import numpy as np
+from stats import calculate_player_total_scores
+
+def test_calculate_player_total_scores():
+ """Test calculate_player_total_scores function"""
+ participants = {
+ "Alice": {
+ "scores": np.array([1, 2, 3])
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6])
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9])
+ },
+ }
+ expected_result = {
+ "Alice": {
+ "scores": np.array([1, 2, 3]),
+ "total_score": 6
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6]),
+ "total_score": 15
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9]),
+ "total_score": 24
+ },
+ }
+ np.testing.assert_equal(calculate_player_total_scores(participants), expected_result)
+```
+
+:::::::::::::::::::::::::::::::::
+
+### Checking if pandas DataFrames are equal
+
+In `statistics/stats.py` add this function to calculate the average score of each player in a pandas DataFrame:
+
+```python
+import pandas as pd
+
+def calculate_player_average_scores(df: pd.DataFrame) -> pd.DataFrame:
+ """Calculate the average score of each player in a pandas DataFrame.
+
+ Example input:
+ | | player | score_1 | score_2 |
+ |---|---------|---------|---------|
+ | 0 | Alice | 1 | 2 |
+ | 1 | Bob | 3 | 4 |
+
+ Example output:
+ | | player | score_1 | score_2 | average_score |
+ |---|---------|---------|---------|---------------|
+ | 0 | Alice | 1 | 2 | 1.5 |
+ | 1 | Bob | 3 | 4 | 3.5 |
+ """
+
+ df["average_score"] = df[["score_1", "score_2"]].mean(axis=1)
+
+ return df
+```
+
+Then write a test for this function by comparing pandas DataFrames.
+
+Hint: You can create a dataframe like this:
+
+```python
+df = pd.DataFrame({
+ "player": ["Alice", "Bob"],
+ "score_1": [1, 3],
+ "score_2": [2, 4]
+})
+```
+
+:::::::::::::::::::::::: solution
+
+```python
+import pandas as pd
+from stats import calculate_player_average_scores
+
+def test_calculate_player_average_scores():
+ """Test calculate_player_average_scores function"""
+ df = pd.DataFrame({
+ "player": ["Alice", "Bob"],
+ "score_1": [1, 3],
+ "score_2": [2, 4]
+ })
+ expected_result = pd.DataFrame({
+ "player": ["Alice", "Bob"],
+ "score_1": [1, 3],
+ "score_2": [2, 4],
+ "average_score": [1.5, 3.5]
+ })
+ pd.testing.assert_frame_equal(calculate_player_average_scores(df), expected_result)
+```
+
+:::::::::::::::::::::::::::::::::
+
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+::::::::::::::::::::::::::::::::::::: keypoints
+
+- You can test equality of lists and dictionaries using the `==` operator.
+- Numpy arrays cannot be compared using the `==` operator. Instead, use `numpy.testing.assert_array_equal` and `numpy.testing.assert_allclose`.
+- Data structures that contain numpy arrays should be compared using `numpy.testing.assert_equal`.
+- Pandas DataFrames and Series should be compared using `pandas.testing.assert_frame_equal` and `pandas.testing.assert_series_equal`.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
diff --git a/episodes/07-fixtures.Rmd b/episodes/07-fixtures.Rmd
index cbec8aab..4d08ad4e 100644
--- a/episodes/07-fixtures.Rmd
+++ b/episodes/07-fixtures.Rmd
@@ -27,105 +27,106 @@ Notice how we have to repeat the exact same setup code in each test.
```python
class Point:
- def __init__(self, x, y):
- self.x = x
- self.y = y
+ def __init__(self, x, y):
+ self.x = x
+ self.y = y
- def distance_from_origin(self):
- return (self.x ** 2 + self.y ** 2) ** 0.5
+ def distance_from_origin(self):
+ return (self.x ** 2 + self.y ** 2) ** 0.5
- def move(self, dx, dy):
- self.x += dx
- self.y += dy
+ def move(self, dx, dy):
+ self.x += dx
+ self.y += dy
- def reflect_over_x(self):
- self.y = -self.y
+ def reflect_over_x(self):
+ self.y = -self.y
- def reflect_over_y(self):
- self.x = -self.x
+ def reflect_over_y(self):
+ self.x = -self.x
```
```python
def test_distance_from_origin():
- # Positive coordinates
- point_positive_coords = Point(3, 4)
- # Negative coordinates
- point_negative_coords = Point(-3, -4)
- # Mix of positive and negative coordinates
- point_mixed_coords = Point(-3, 4)
+ # Positive coordinates
+ point_positive_coords = Point(3, 4)
+ # Negative coordinates
+ point_negative_coords = Point(-3, -4)
+ # Mix of positive and negative coordinates
+ point_mixed_coords = Point(-3, 4)
- assert point_positive_coords.distance_from_origin() == 5.0
- assert point_negative_coords.distance_from_origin() == 5.0
- assert point_mixed_coords.distance_from_origin() == 5.0
+ assert point_positive_coords.distance_from_origin() == 5.0
+ assert point_negative_coords.distance_from_origin() == 5.0
+ assert point_mixed_coords.distance_from_origin() == 5.0
def test_move():
- # Repeated setup again...
-
- # Positive coordinates
- point_positive_coords = Point(3, 4)
- # Negative coordinates
- point_negative_coords = Point(-3, -4)
- # Mix of positive and negative coordinates
- point_mixed_coords = Point(-3, 4)
-
- # Test logic
- point_positive_coords.move(2, -1)
- point_negative_coords.move(2, -1)
- point_mixed_coords.move(2, -1)
-
- assert point_positive_coords.x == 5
- assert point_positive_coords.y == 3
- assert point_negative_coords.x == -1
- assert point_negative_coords.y == -5
- assert point_mixed_coords.x == -1
- assert point_mixed_coords.y == 3
+ # Repeated setup again...
+
+ # Positive coordinates
+ point_positive_coords = Point(3, 4)
+ # Negative coordinates
+ point_negative_coords = Point(-3, -4)
+ # Mix of positive and negative coordinates
+ point_mixed_coords = Point(-3, 4)
+
+
+ # Test logic
+ point_positive_coords.move(2, -1)
+ point_negative_coords.move(2, -1)
+ point_mixed_coords.move(2, -1)
+
+ assert point_positive_coords.x == 5
+ assert point_positive_coords.y == 3
+ assert point_negative_coords.x == -1
+ assert point_negative_coords.y == -5
+ assert point_mixed_coords.x == -1
+ assert point_mixed_coords.y == 3
def test_reflect_over_x():
- # Yet another setup repetition
+ # Yet another setup repetition
- # Positive coordinates
- point_positive_coordinates = Point(3, 4)
- # Negative coordinates
- point_negative_coordinates = Point(-3, -4)
- # Mix of positive and negative coordinates
- point_mixed_coordinates = Point(-3, 4)
+ # Positive coordinates
+ point_positive_coordinates = Point(3, 4)
+ # Negative coordinates
+ point_negative_coordinates = Point(-3, -4)
+ # Mix of positive and negative coordinates
+ point_mixed_coordinates = Point(-3, 4)
- # Test logic
- point_positive_coordinates.reflect_over_x()
- point_negative_coordinates.reflect_over_x()
- point_mixed_coordinates.reflect_over_x()
+ # Test logic
+ point_positive_coordinates.reflect_over_x()
+ point_negative_coordinates.reflect_over_x()
+ point_mixed_coordinates.reflect_over_x()
- assert point_positive_coordinates.x == 3
- assert point_positive_coordinates.y == -4
- assert point_negative_coordinates.x == -3
- assert point_negative_coordinates.y == 4
- assert point_mixed_coordinates.x == -3
- assert point_mixed_coordinates.y == -4
+ assert point_positive_coordinates.x == 3
+ assert point_positive_coordinates.y == -4
+ assert point_negative_coordinates.x == -3
+ assert point_negative_coordinates.y == 4
+ assert point_mixed_coordinates.x == -3
+ assert point_mixed_coordinates.y == -4
def test_reflect_over_y():
- # One more time...
-
- # Positive coordinates
- point_positive_coordinates = Point(3, 4)
- # Negative coordinates
- point_negative_coordinates = Point(-3, -4)
- # Mix of positive and negative coordinates
- point_mixed_coordinates = Point(-3, 4)
-
- # Test logic
- point_positive_coordinates.reflect_over_y()
- point_negative_coordinates.reflect_over_y()
- point_mixed_coordinates.reflect_over_y()
-
- assert point_positive_coordinates.x == -3
- assert point_positive_coordinates.y == 4
- assert point_negative_coordinates.x == 3
- assert point_negative_coordinates.y == -4
- assert point_mixed_coordinates.x == 3
- assert point_mixed_coordinates.y == 4
+ # One more time...
+
+ # Positive coordinates
+ point_positive_coordinates = Point(3, 4)
+ # Negative coordinates
+ point_negative_coordinates = Point(-3, -4)
+ # Mix of positive and negative coordinates
+ point_mixed_coordinates = Point(-3, 4)
+
+ # Test logic
+ point_positive_coordinates.reflect_over_y()
+ point_negative_coordinates.reflect_over_y()
+ point_mixed_coordinates.reflect_over_y()
+
+ assert point_positive_coordinates.x == -3
+ assert point_positive_coordinates.y == 4
+ assert point_negative_coordinates.x == 3
+ assert point_negative_coordinates.y == -4
+ assert point_mixed_coordinates.x == 3
+ assert point_mixed_coordinates.y == 4
```
@@ -146,10 +147,10 @@ import pytest
@pytest.fixture
def my_fixture():
- return "Hello, world!"
+ return "Hello, world!"
def test_my_fixture(my_fixture):
- assert my_fixture == "Hello, world!"
+ assert my_fixture == "Hello, world!"
```
Here, Pytest will notice that `my_fixture` is a fixture due to the `@pytest.fixture` decorator, and will run `my_fixture`, then pass the result into `test_my_fixture`.
@@ -161,56 +162,56 @@ import pytest
@pytest.fixture
def point_positive_3_4():
- return Point(3, 4)
+ return Point(3, 4)
@pytest.fixture
def point_negative_3_4():
- return Point(-3, -4)
+ return Point(-3, -4)
@pytest.fixture
def point_mixed_3_4():
- return Point(-3, 4)
+ return Point(-3, 4)
def test_distance_from_origin(point_positive_3_4, point_negative_3_4, point_mixed_3_4):
- assert point_positive_3_4.distance_from_origin() == 5.0
- assert point_negative_3_4.distance_from_origin() == 5.0
- assert point_mixed_3_4.distance_from_origin() == 5.0
+ assert point_positive_3_4.distance_from_origin() == 5.0
+ assert point_negative_3_4.distance_from_origin() == 5.0
+ assert point_mixed_3_4.distance_from_origin() == 5.0
def test_move(point_positive_3_4, point_negative_3_4, point_mixed_3_4):
- point_positive_3_4.move(2, -1)
- point_negative_3_4.move(2, -1)
- point_mixed_3_4.move(2, -1)
+ point_positive_3_4.move(2, -1)
+ point_negative_3_4.move(2, -1)
+ point_mixed_3_4.move(2, -1)
- assert point_positive_3_4.x == 5
- assert point_positive_3_4.y == 3
- assert point_negative_3_4.x == -1
- assert point_negative_3_4.y == -5
- assert point_mixed_3_4.x == -1
- assert point_mixed_3_4.y == 3
+ assert point_positive_3_4.x == 5
+ assert point_positive_3_4.y == 3
+ assert point_negative_3_4.x == -1
+ assert point_negative_3_4.y == -5
+ assert point_mixed_3_4.x == -1
+ assert point_mixed_3_4.y == 3
def test_reflect_over_x(point_positive_3_4, point_negative_3_4, point_mixed_3_4):
- point_positive_3_4.reflect_over_x()
- point_negative_3_4.reflect_over_x()
- point_mixed_3_4.reflect_over_x()
+ point_positive_3_4.reflect_over_x()
+ point_negative_3_4.reflect_over_x()
+ point_mixed_3_4.reflect_over_x()
- assert point_positive_3_4.x == 3
- assert point_positive_3_4.y == -4
- assert point_negative_3_4.x == -3
- assert point_negative_3_4.y == 4
- assert point_mixed_3_4.x == -3
- assert point_mixed_3_4.y == -4
+ assert point_positive_3_4.x == 3
+ assert point_positive_3_4.y == -4
+ assert point_negative_3_4.x == -3
+ assert point_negative_3_4.y == 4
+ assert point_mixed_3_4.x == -3
+ assert point_mixed_3_4.y == -4
def test_reflect_over_y(point_positive_3_4, point_negative_3_4, point_mixed_3_4):
- point_positive_3_4.reflect_over_y()
- point_negative_3_4.reflect_over_y()
- point_mixed_3_4.reflect_over_y()
-
- assert point_positive_3_4.x == -3
- assert point_positive_3_4.y == 4
- assert point_negative_3_4.x == 3
- assert point_negative_3_4.y == -4
- assert point_mixed_3_4.x == 3
- assert point_mixed_3_4.y == 4
+ point_positive_3_4.reflect_over_y()
+ point_negative_3_4.reflect_over_y()
+ point_mixed_3_4.reflect_over_y()
+
+ assert point_positive_3_4.x == -3
+ assert point_positive_3_4.y == 4
+ assert point_negative_3_4.x == 3
+ assert point_negative_3_4.y == -4
+ assert point_mixed_3_4.x == 3
+ assert point_mixed_3_4.y == 4
```
With the setup code defined in the fixtures, the tests are more concise and it won't take as much effort to add more tests in the future.
@@ -359,42 +360,42 @@ def participants():
]
def test_sample_participants(participants):
- # set random seed
- random.seed(0)
+ # set random seed
+ random.seed(0)
- sample_size = 2
- sampled_participants = sample_participants(participants, sample_size)
- expected = [{"age": 38, "height": 165}, {"age": 45, "height": 200}]
- assert sampled_participants == expected
+ sample_size = 2
+ sampled_participants = sample_participants(participants, sample_size)
+ expected = [{"age": 38, "height": 165}, {"age": 45, "height": 200}]
+ assert sampled_participants == expected
def test_filter_participants_by_age(participants):
- min_age = 30
- max_age = 35
- filtered_participants = filter_participants_by_age(participants, min_age, max_age)
- expected = [{"age": 30, "height": 170}, {"age": 35, "height": 160}]
- assert filtered_participants == expected
+ min_age = 30
+ max_age = 35
+ filtered_participants = filter_participants_by_age(participants, min_age, max_age)
+ expected = [{"age": 30, "height": 170}, {"age": 35, "height": 160}]
+ assert filtered_participants == expected
def test_filter_participants_by_height(participants):
- min_height = 160
- max_height = 170
- filtered_participants = filter_participants_by_height(participants, min_height, max_height)
- expected = [{"age": 30, "height": 170}, {"age": 35, "height": 160}, {"age": 38, "height": 165}]
- assert filtered_participants == expected
+ min_height = 160
+ max_height = 170
+ filtered_participants = filter_participants_by_height(participants, min_height, max_height)
+ expected = [{"age": 30, "height": 170}, {"age": 35, "height": 160}, {"age": 38, "height": 165}]
+ assert filtered_participants == expected
def test_randomly_sample_and_filter_participants(participants):
- # set random seed
- random.seed(0)
-
- sample_size = 5
- min_age = 28
- max_age = 42
- min_height = 159
- max_height = 172
- filtered_participants = randomly_sample_and_filter_participants(
- participants, sample_size, min_age, max_age, min_height, max_height
- )
- expected = [{"age": 38, "height": 165}, {"age": 30, "height": 170}, {"age": 35, "height": 160}]
- assert filtered_participants == expected
+ # set random seed
+ random.seed(0)
+
+ sample_size = 5
+ min_age = 28
+ max_age = 42
+ min_height = 159
+ max_height = 172
+ filtered_participants = randomly_sample_and_filter_participants(
+ participants, sample_size, min_age, max_age, min_height, max_height
+ )
+ expected = [{"age": 38, "height": 165}, {"age": 30, "height": 170}, {"age": 35, "height": 160}]
+ assert filtered_participants == expected
```
diff --git a/episodes/08-parametrization.Rmd b/episodes/08-parametrization.Rmd
index a78cb1e0..d9e434f5 100644
--- a/episodes/08-parametrization.Rmd
+++ b/episodes/08-parametrization.Rmd
@@ -31,22 +31,22 @@ We have a Triangle class that has a function to calculate the triangle's area fr
```python
-class Point:
- def __init__(self, x, y):
- self.x = x
- self.y = y
+def Point:
+ def __init__(self, x, y):
+ self.x = x
+ self.y = y
class Triangle:
- def __init__(self, p1: Point, p2: Point, p3: Point):
- self.p1 = p1
- self.p2 = p2
- self.p3 = p3
+ def __init__(self, p1: Point, p2: Point, p3: Point):
+ self.p1 = p1
+ self.p2 = p2
+ self.p3 = p3
- def calculate_area(self):
- a = ((self.p1.x * (self.p2.y - self.p3.y)) +
- (self.p2.x * (self.p3.y - self.p1.y)) +
- (self.p3.x * (self.p1.y - self.p2.y))) / 2
- return abs(a)
+ def calculate_area(self):
+ a = ((self.p1.x * (self.p2.y - self.p3.y)) +
+ (self.p2.x * (self.p3.y - self.p1.y)) +
+ (self.p3.x * (self.p1.y - self.p2.y))) / 2
+ return abs(a)
```
@@ -54,42 +54,42 @@ If we want to test this function with different combinations of sides, we could
```python
def test_calculate_area():
- """Test the calculate_area function of the Triangle class"""
-
- # Equilateral triangle
- p11 = Point(0, 0)
- p12 = Point(2, 0)
- p13 = Point(1, 1.7320)
- t1 = Triangle(p11, p12, p13)
- assert t1.calculate_area() == 6
-
- # Right-angled triangle
- p21 = Point(0, 0)
- p22 = Point(3, 0)
- p23 = Point(0, 4)
- t2 = Triangle(p21, p22, p23)
- assert t2.calculate_area() == 6
-
- # Isosceles triangle
- p31 = Point(0, 0)
- p32 = Point(4, 0)
- p33 = Point(2, 8)
- t3 = Triangle(p31, p32, p33)
- assert t3.calculate_area() == 16
-
- # Scalene triangle
- p41 = Point(0, 0)
- p42 = Point(3, 0)
- p43 = Point(1, 4)
- t4 = Triangle(p41, p42, p43)
- assert t4.calculate_area() == 6
-
- # Negative values
- p51 = Point(0, 0)
- p52 = Point(-3, 0)
- p53 = Point(0, -4)
- t5 = Triangle(p51, p52, p53)
- assert t5.calculate_area() == 6
+ """Test the calculate_area function of the Triangle class"""
+
+ # Equilateral triangle
+ p11 = Point(0, 0)
+ p12 = Point(2, 0)
+ p13 = Point(1, 1.7320)
+ t1 = Triangle(p11, p12, p13)
+ assert t1.calculate_area() == 6
+
+ # Right-angled triangle
+ p21 = Point(0, 0)
+ p22 = Point(3, 0)
+ p23 = Point(0, 4)
+ t2 = Triangle(p21, p22, p23)
+ assert t2.calculate_area() == 6
+
+ # Isosceles triangle
+ p31 = Point(0, 0)
+ p32 = Point(4, 0)
+ p33 = Point(2, 8)
+ t3 = Triangle(p31, p32, p33)
+ assert t3.calculate_area() == 16
+
+ # Scalene triangle
+ p41 = Point(0, 0)
+ p42 = Point(3, 0)
+ p43 = Point(1, 4)
+ t4 = Triangle(p41, p42, p43)
+ assert t4.calculate_area() == 6
+
+ # Negative values
+ p51 = Point(0, 0)
+ p52 = Point(-3, 0)
+ p53 = Point(0, -4)
+ t5 = Triangle(p51, p52, p53)
+ assert t5.calculate_area() == 6
```
This test is quite long and repetitive. We can use parametrization to make it more concise:
@@ -98,21 +98,21 @@ This test is quite long and repetitive. We can use parametrization to make it mo
import pytest
@pytest.mark.parametrize(
- ("p1x, p1y, p2x, p2y, p3x, p3y, expected"),
- [
- pytest.param(0, 0, 2, 0, 1, 1.7320, 6, id="Equilateral triangle"),
- pytest.param(0, 0, 3, 0, 0, 4, 6, id="Right-angled triangle"),
- pytest.param(0, 0, 4, 0, 2, 8, 16, id="Isosceles triangle"),
- pytest.param(0, 0, 3, 0, 1, 4, 6, id="Scalene triangle"),
- pytest.param(0, 0, -3, 0, 0, -4, 6, id="Negative values")
- ]
+ ("p1x, p1y, p2x, p2y, p3x, p3y, expected"),
+ [
+ pytest.param(0, 0, 2, 0, 1, 1.7320, 6, id="Equilateral triangle"),
+ pytest.param(0, 0, 3, 0, 0, 4, 6, id="Right-angled triangle"),
+ pytest.param(0, 0, 4, 0, 2, 8, 16, id="Isosceles triangle"),
+ pytest.param(0, 0, 3, 0, 1, 4, 6, id="Scalene triangle"),
+ pytest.param(0, 0, -3, 0, 0, -4, 6, id="Negative values")
+ ]
)
def test_calculate_area(p1x, p1y, p2x, p2y, p3x, p3y, expected):
- p1 = Point(p1x, p1y)
- p2 = Point(p2x, p2y)
- p3 = Point(p3x, p3y)
- t = Triangle(p1, p2, p3)
- assert t.calculate_area() == expected
+ p1 = Point(p1x, p1y)
+ p2 = Point(p2x, p2y)
+ p3 = Point(p3x, p3y)
+ t = Triangle(p1, p2, p3)
+ assert t.calculate_area() == expected
```
Let's have a look at how this works.
@@ -150,6 +150,7 @@ def is_prime(n: int) -> bool:
if n % i == 0:
return False
return True
+
```
:::::::::::::::::::::::: solution
@@ -158,37 +159,37 @@ def is_prime(n: int) -> bool:
import pytest
@pytest.mark.parametrize(
- ("n, expected"),
- [
- pytest.param(0, False, id="0 is not prime"),
- pytest.param(1, False, id="1 is not prime"),
- pytest.param(2, True, id="2 is prime"),
- pytest.param(3, True, id="3 is prime"),
- pytest.param(4, False, id="4 is not prime"),
- pytest.param(5, True, id="5 is prime"),
- pytest.param(6, False, id="6 is not prime"),
- pytest.param(7, True, id="7 is prime"),
- pytest.param(8, False, id="8 is not prime"),
- pytest.param(9, False, id="9 is not prime"),
- pytest.param(10, False, id="10 is not prime"),
- pytest.param(11, True, id="11 is prime"),
- pytest.param(12, False, id="12 is not prime"),
- pytest.param(13, True, id="13 is prime"),
- pytest.param(14, False, id="14 is not prime"),
- pytest.param(15, False, id="15 is not prime"),
- pytest.param(16, False, id="16 is not prime"),
- pytest.param(17, True, id="17 is prime"),
- pytest.param(18, False, id="18 is not prime"),
- pytest.param(19, True, id="19 is prime"),
- pytest.param(20, False, id="20 is not prime"),
- pytest.param(21, False, id="21 is not prime"),
- pytest.param(22, False, id="22 is not prime"),
- pytest.param(23, True, id="23 is prime"),
- pytest.param(24, False, id="24 is not prime"),
- ]
+ ("n, expected"),
+ [
+ pytest.param(0, False, id="0 is not prime"),
+ pytest.param(1, False, id="1 is not prime"),
+ pytest.param(2, True, id="2 is prime"),
+ pytest.param(3, True, id="3 is prime"),
+ pytest.param(4, False, id="4 is not prime"),
+ pytest.param(5, True, id="5 is prime"),
+ pytest.param(6, False, id="6 is not prime"),
+ pytest.param(7, True, id="7 is prime"),
+ pytest.param(8, False, id="8 is not prime"),
+ pytest.param(9, False, id="9 is not prime"),
+ pytest.param(10, False, id="10 is not prime"),
+ pytest.param(11, True, id="11 is prime"),
+ pytest.param(12, False, id="12 is not prime"),
+ pytest.param(13, True, id="13 is prime"),
+ pytest.param(14, False, id="14 is not prime"),
+ pytest.param(15, False, id="15 is not prime"),
+ pytest.param(16, False, id="16 is not prime"),
+ pytest.param(17, True, id="17 is prime"),
+ pytest.param(18, False, id="18 is not prime"),
+ pytest.param(19, True, id="19 is prime"),
+ pytest.param(20, False, id="20 is not prime"),
+ pytest.param(21, False, id="21 is not prime"),
+ pytest.param(22, False, id="22 is not prime"),
+ pytest.param(23, True, id="23 is prime"),
+ pytest.param(24, False, id="24 is not prime"),
+ ]
)
def test_is_prime(n, expected):
- assert is_prime(n) == expected
+ assert is_prime(n) == expected
```
:::::::::::::::::::::::::::::::::
diff --git a/episodes/09-testing-output-files.Rmd b/episodes/09-testing-output-files.Rmd
index 56b81327..96c55c31 100644
--- a/episodes/09-testing-output-files.Rmd
+++ b/episodes/09-testing-output-files.Rmd
@@ -1,223 +1,438 @@
---
-title: 'Regression Testing and Plots'
+title: 'Regression Tests'
teaching: 10
-exercises: 2
+exercises: 3
---
:::::::::::::::::::::::::::::::::::::: questions
-- How to test for changes in program outputs?
-- How to test for changes in plots?
+- How can we detect changes in program outputs?
+- How can snapshots make this easier?
::::::::::::::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::: objectives
-- Learn how to test for changes in images & plots
+- Explain what regression tests are and when they’re useful
+- Write a manual regression test (save output and compare later)
+- Use Snaptol snapshots to simplify output/array regression testing
+- Use tolerances (rtol/atol) to handle numerical outputs safely
::::::::::::::::::::::::::::::::::::::::::::::::
-## Regression testing
-When you have a large processing pipeline or you are just starting out adding tests to an existing project, you might not have the
-time to carefully define exactly what each function should do, or your code may be so complex that it's hard to write unit tests for it all.
+## 1) Introduction
-In these cases, you can use regression testing. This is where you just test that the output of a function matches the output of a previous version of the function.
+In short, a regression test asks "this test used to produce X, does it still produce X?". This can help us detect
+unexpected or unwanted changes in the output of a program.
-The library `pytest-regtest` provides a simple way to do this. When writing a test, we pass the argument `regtest` to the test function and use `regtest.write()` to log the output of the function.
-This tells pytest-regtest to compare the output of the test to the output of the previous test run.
+They are particularly useful,
-To install `pytest-regtest`:
+- when beginning to add tests to an existing project,
-```bash
-pip install pytest-regtest
-```
+- when adding unit tests to all parts of a project is not feasible,
+
+- to quickly give a good test coverage,
-::::::::::::::::::::::: callout
+- when it does not matter if the output is correct or not.
-This `regtest` argument is actually a fixture that is provided by the `pytest-regtest` package. It captures
-the output of the test function and compares it to the output of the previous test run. If the output is
-different, the test will fail.
+These types of tests are not a substitute for unit tests, but rather are complimentary.
-:::::::::::::::::::::::::::::::
-Let's make a regression test:
+## 2) Manual example
-- Create a new function in `statistics/stats.py` called `very_complex_processing()`:
+Let's make a regression test in a `test.py` file. It is going to utilise a "very complex" processing function to
+simulate the processing of data,
```python
+# test.py
def very_complex_processing(data: list):
+ return [x ** 2 - 10 * x + 42 for x in data]
+```
+
+Let's write the basic structure for a test with example input data, but for now we will simply print the output,
+
+```python
+# test.py continued
- # Do some very complex processing
- processed_data = [x * 2 for x in data]
+def test_something():
+ input_data = [i for i in range(8)]
- return processed_data
+ processed_data = very_complex_processing(input_data)
+
+ print(processed_data)
+```
+
+Let's run `pytest` with reduced verbosity `-q` and print the statement from the test `-s`,
+
+```console
+$ pytest -qs test.py
+[42, 33, 26, 21, 18, 17, 18, 21]
+.
+1 passed in 0.00s
```
-- Then in `test_stats.py`, we can add a regression test for this function using the `regtest` argument.
+We get a list of output numbers that simulate the result of a complex function in our project. Let's save this data at
+the top of our `test.py` file so that we can `assert` that it is always equal to the output of the processing function,
```python
-import pytest
+# test.py
-from stats import very_complex_processing
+SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21]
-def test_very_complex_processing(regtest):
+def very_complex_processing(data: list):
+ return [x ** 2 - 10 * x + 42 for x in data]
- data = [1, 2, 3]
- processed_data = very_complex_processing(data)
+def test_something():
+ input_data = [i for i in range(8)]
- regtest.write(str(processed_data))
+ processed_data = very_complex_processing(input_data)
+
+ assert SNAPSHOT_DATA == processed_data
```
-- Now because we haven't run the test yet, there is no reference output to compare against,
-so we need to generate it using the `--regtest-generate` flag:
+We call the saved version of the data a "snapshot".
+
+We can now be assured that any development of the code that erroneously alters the output of the function will cause the
+test to fail. For example, suppose we slightly altered the `very_complex_processing` function,
-```bash
-pytest --regtest-generate
+```python
+def very_complex_processing(data: list):
+ return [3 * x ** 2 - 10 * x + 42 for x in data]
+# ^^^^ small change
```
-This tells pytest to run the test but instead of comparing the result, it will save the result for use in future tests.
+Then, running the test causes it to fail,
+```console
+$ pytest -q test.py
+F
+__________________________________ FAILURES _________________________________
+_______________________________ test_something ______________________________
+
+ def test_something():
+ input_data = [i for i in range(8)]
+
+ processed_data = very_complex_processing(input_data)
+
+> assert SNAPSHOT_DATA == processed_data
+E assert [42, 33, 26, 21, 18, 17, ...] == [42, 35, 34, 39, 50, 67, ...]
+E At index 1 diff: 33 != 35
+
+test.py:12: AssertionError
+1 failed in 0.03s
+```
+
+If the change was intentional, then we could print the output again and update `SNAPSHOT_DATA`. Otherwise, we would want
+to investigate the cause of the change and fix it.
+
-- Try running pytest and since we haven't changed how the function works, the test should pass.
+## 3) Snaptol
-- Then change the function to break the test and re-run pytest. The test will fail and show you the difference between the expected and actual output.
+So far, performing a regression test manually has been a bit tedious. Storing the output data at the top of our test
+file,
-```bash
+- adds clutter,
-=== FAILURES ===
-___ test_very_complex_processing ___
+- is laborious,
-regression test output differences for statistics/test_stats.py::test_very_complex_processing:
-(recorded output from statistics/_regtest_outputs/test_stats.test_very_complex_processing.out)
+- is prone to errors.
-> --- current
-> +++ expected
-> @@ -1 +1 @@
-> -[3, 6, 9]
-> +[2, 4, 6]
+We could move the data to a separate file, but once again we would have to handle its contents manually.
+
+There are tools out there that can handle this for us, one widely known is Syrupy. A new tool has also been developed
+called Snaptol, that we will use here.
+
+Let's use the original `very_complex_processing` function, and introduce the `snaptolshot` fixture,
+
+```python
+# test.py
+
+def very_complex_processing(data: list):
+ return [x ** 2 - 10 * x + 42 for x in data]
+
+def test_something(snaptolshot):
+ input_data = [i for i in range(8)]
+
+ processed_data = very_complex_processing(input_data)
+
+ assert snaptolshot == processed_data
```
-Here we can see that it has picked up on the difference between the expected and actual output, and displayed it for us to see.
+Notice that we have replaced the `SNAPSHOT_DATA` variable with `snaptolshot`, which is an object provided by
+Snaptol that can handle the snapshot file management, amongst other smart features, for us.
-Regression tests, while not as powerful as unit tests, are a great way to quickly add tests to a project and ensure that changes to the code don't break existing functionality.
-It is also a good idea to add regression tests to your main processing pipelines just in case your unit tests don't cover all the edge cases, this will
-ensure that the output of your program remains consistent between versions.
+When we run the test for the first time, we will be met with a `FileNotFoundError`,
-## Testing plots
+```console
+$ pytest -q test.py
+F
+================================== FAILURES =================================
+_______________________________ test_something ______________________________
-When you are working with plots, you may want to test that the output is as expected. This can be done by comparing the output to a reference image or plot.
-The `pytest-mpl` package provides a simple way to do this, automating the comparison of the output of a test function to a reference image.
+ def test_something(snaptolshot):
+ input_data = [i for i in range(8)]
-To install `pytest-mpl`:
+ processed_data = very_complex_processing(input_data)
-```bash
-pip install pytest-mpl
+> assert snaptolshot == processed_data
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+test.py:10:
+_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
+.../snapshot.py:167: FileNotFoundError
+========================== short test summary info ==========================
+FAILED test.py::test_something - FileNotFoundError: Snapshot file not found.
+1 failed in 0.03s
```
-- Create a new folder called `plotting` and add a file `plotting.py` with the following function:
+This is because we have not yet created the snapshot file. Let's run `snaptol` in update mode so that it knows to create
+the snapshot file for us. This is similar to the print, copy and paste step in the manual approach above,
-```python
-import matplotlib.pyplot as plt
+```console
+$ pytest -q test.py --snaptol-update
+.
+1 passed in 0.00s
+```
+
+This tells us that the test performed successfully, and, because we were in update mode, an associated snapshot file was
+created with the name format `..json` in a dedicated directory,
+
+```console
+$ tree
+.
+├── __snapshots__
+│ └── test.test_something.json
+└── test.py
+```
-def plot_data(data: list):
- fig, ax = plt.subplots()
- ax.plot(data)
- return fig
+The contents of the JSON file are the same as in the manual example,
+```json
+[
+ 42,
+ 33,
+ 26,
+ 21,
+ 18,
+ 17,
+ 18,
+ 21
+]
```
-This function takes a list of points to plot, plots them and returns the figure produced.
+As the data is saved in JSON format, almost any Python object can be used in a snapshot test – not just integers and
+lists.
+
+Just as previously, if we alter the function then the test will fail. We can similarly update the snapshot file with
+the new output with the `--snaptol-update` flag as above.
+
+::::::::::::::::::::::::::::::::::::: callout
-In order to test that this funciton produces the correct plots, we will need to store the correct plots to compare against.
-- Create a new folder called `test_plots` inside the `plotting` folder. This is where we will store the reference images.
+**Note:** `--snaptol-update` will only update snapshot files for tests that failed in the previous run of `pytest`. This
+is because the expected workflow is 1) run `pytest`, 2) observe a test failure, 3) if happy with the change then run
+the update, `--snaptol-update`. This stops the unnecessary rewrite of snapshot files in tests that pass – which is
+particularly important when we allow for tolerance as explained in the next section.
-`pytest-mpl` adds the `@pytest.mark.mpl_image_compare` decorator that is used to compare the output of a test function to a reference image.
-It takes a `baseline_dir` argument that specifies the directory where the reference images are stored.
+:::::::::::::::::::::::::::::::::::::::::::::
-- Create a new file called `test_plotting.py` in the `plotting` folder with the following content:
+
+### Floating point numbers
+
+Consider a simulation code that uses algorithms that depend on convergence – perhaps a complicated equation that does
+not have an exact answer but can be approximated numerically within a given tolerance. This, along with the common use
+of controlled randomised initial conditions, can lead to results that differ slightly between runs.
+
+In the example below, we use the `estimate_pi` function from the "Floating Point Data" module. It relies on the use of
+randomised input and as a result the determined value will vary slightly between runs.
```python
-import pytest
-from plotting import plot_data
+# test_tol.py
+import random
+
+def estimate_pi(iterations):
+ num_inside = 0
+ for _ in range(iterations):
+ x = random.random()
+ y = random.random()
+ if x**2 + y**2 < 1:
+ num_inside += 1
+ return 4 * num_inside / iterations
-@pytest.mark.mpl_image_compare(baseline_dir="test_plots/")
-def test_plot_data():
- data = [1, 3, 2]
- fig = plot_data(data)
- return fig
+def test_something(snaptolshot):
+ result = estimate_pi(10000000)
+
+ print(result)
+
+ snaptolshot.assert_allclose(result, rtol=1e-03, atol=0.0)
```
-Here we have told pytest that we want it to compare the output of the `test_plot_data` function to the images in the `test_plots` directory.
+Notice that here we use a method of the `snaptolshot` object called `assert_allclose`. This is a wrapper around the
+`numpy.testing.assert_allclose` function, as discussed in the "Floating Point Data" module, and allows us to specify
+tolerances for the comparison rather than asserting an exact equality.
-- Run the following command to generate the reference image:
-(make sure you are in the base directory in your project and not in the plotting folder)
+Let's run the test initially like before but create the snapshot file straight away by running in update mode,
-```bash
-pytest --mpl-generate-path=plotting/test_plots
+```console
+$ pytest -qs test_tol.py --snaptol-update-all
+3.1423884
+.
+1 passed in 0.30s
```
-This tells pytest to run the test but instead of comparing the result, it will save the result into the `test_plots` directory for use in future tests.
+Even with ten million data points, the approximation of pi, 3.1423884, isn't great!
+
+::::::::::::::::::::::::::::::::::::: callout
+
+**Note:** remember that the result of a regression test is not the important part, but rather on how that result changes
+in future runs. We want to focus on whether our code reproduces the result in future runs – in this case within a given
+tolerance to account for the randomness.
+
+:::::::::::::::::::::::::::::::::::::::::::::
+
+In the test above, we supplied `rtol` and `atol` arguments to the function in the assertion. These are used to control
+the tolerance of the comparison between the snapshot and the actual output. This means on future runs of the test, the
+computed value will not be required to exactly match the snapshot, but rather within the given tolerance. Remember,
-Now we have the reference image, we can run the test to ensure that the output of `plot_data` matches the reference image.
-Pytest doesn't check the images by default, so we need to pass it the `--mpl` flag to tell it to check the images.
+- `rtol` is the relative tolerance, useful for handling large numbers (e.g magnitude much greater than 1),
+- `atol` is the absolute tolerance, useful for numbers "near zero" (e.g magnitude much less than 1).
-```bash
-pytest --mpl
+If we run the test again, we see the printed output is different to that saved to file, but the test still passes,
+
+```console
+$ pytest -qs test_tol.py
+3.1408724
+.
+1 passed in 0.24s
```
-Since we just generated the reference image, the test should pass.
-Now let's edit the `plot_data` function to plot a different set of points by adding a 4 to the data:
+## Exercises
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## Create your own regression test
+
+- Add the below code to a new file and add your own code to the `...` sections.
+
+- On the first run, capture the output of your implemented `very_complex_processing` function and store it
+appropriately.
+
+- After, ensure the test compares the stored data to the result, and passes successfully. Avoid using `float`s for now.
```python
-import matplotlib.pyplot as plt
+def very_complex_processing(data):
+ return ...
+
+def test_something():
+ input_data = ...
+
+ processed_data = very_complex_processing(input_data)
-def plot_data(data: list):
- fig, ax = plt.subplots()
- # Add 4 to the data
- data.append(4)
- ax.plot(data)
- return fig
+ assert ...
```
-- Now re-run the test. You should see that it fails.
+:::::::::::::::::::::::: solution
-```bash
-=== FAILURES ===
-___ test_plot_data ___
-Error: Image files did not match.
- RMS Value: 15.740441786649093
- Expected:
- /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/baseline.png
- Actual:
- /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/result.png
- Difference:
- /var/folders/sr/wjtfqr9s6x3bw1s647t649x80000gn/T/tmp6d0p4yvm/test_plotting.test_plot_data/result-failed-diff.png
- Tolerance:
- 2
+```python
+SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21]
+
+def very_complex_processing(data: list):
+ return [x ** 2 - 10 * x + 42 for x in data]
+
+def test_something():
+ input_data = [i for i in range(8)]
+
+ processed_data = very_complex_processing(input_data)
+
+ assert SNAPSHOT_DATA == processed_data
```
-Notice that the test shows you three image files.
-(All of these files are stored in a temporary directory that pytest creates when running the test.
-Depending on your system, you may be able to click on the paths to view the images. Try holding down CTRL or Command and clicking on the path.)
+:::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::::::::::::::::::::::::::
-- The first, "Expected" is the reference image that the test is comparing against.
-- The second, "Actual" is the image that was produced by the test.
-- And the third is a difference image that shows the differences between the two images. This is very useful as it enables us to cleraly see
-what went wrong with the plotting, allowing us to fix the issue more easily. In this example, we can clearly see that the axes ticks are different, and
-the line plot is a completely different shape.
+::::::::::::::::::::::::::::::::::::: challenge
-This doesn't just work with line plots, but with any type of plot that matplotlib can produce.
+## Implement a regression test with Snaptol
-Testing your plots can be very useful especially if your project allows users to define their own plots.
+- Using the `estimate_pi` function above, implement a regression test using the `snaptolshot` object.
+- Ensure to use the `assert_allclose` method to compare the result to the snapshot carefully.
-::::::::::::::::::::::::::::::::::::: keypoints
+- On the first pass, ensure that it fails due to a `FileNotFoundError`.
-- Regression testing ensures that the output of a function remains consistent between changes and are a great first step in adding tests to an existing project.
-- `pytest-regtest` provides a simple way to do regression testing.
-- `pytest-mpl` provides a simple way to test plots by comparing the output of a test function to a reference image.
+- Run it in update mode to save the snapshot, and ensure it passes successfuly on future runs.
-::::::::::::::::::::::::::::::::::::::::::::::::
+:::::::::::::::::::::::: solution
+
+```python
+import random
+
+def estimate_pi(iterations):
+ num_inside = 0
+ for _ in range(iterations):
+ x = random.random()
+ y = random.random()
+ if x**2 + y**2 < 1:
+ num_inside += 1
+ return 4 * num_inside / iterations
+
+def test_something(snaptolshot):
+ result = estimate_pi(10000000)
+
+ snaptolshot.assert_allclose(result, rtol=1e-03, atol=0.0)
+```
+
+:::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+## More complex regression tests
+
+- Create two separate tests that both utilise the `estimate_pi` function as a fixture.
+
+- Using different tolerances for each test, assert that the first passes successfully, and assert that the second raises
+an `AssertionError`. Hints: 1) remember to look back at the "Testing for Exceptions" and "Fixtures" modules, 2) the
+error in the pi calculation algorithm is $\frac{1}{\sqrt{N}}$ where $N$ is the number of points used.
+
+:::::::::::::::::::::::: solution
+
+```python
+import random
+import pytest
+
+@pytest.fixture
+def estimate_pi():
+ iterations = 10000000
+ num_inside = 0
+ for _ in range(iterations):
+ x = random.random()
+ y = random.random()
+ if x**2 + y**2 < 1:
+ num_inside += 1
+ return 4 * num_inside / iterations
+
+def test_pi_passes(snaptolshot, estimate_pi):
+ # Passes due to loose tolerance.
+ snaptolshot.assert_allclose(estimate_pi, rtol=1e-03, atol=0.0)
+
+def test_pi_fails(snaptolshot, estimate_pi):
+ # Fails due to tight tolerance.
+ with pytest.raises(AssertionError):
+ snaptolshot.assert_allclose(estimate_pi, rtol=1e-04, atol=0.0)
+```
+
+:::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::::::::::::
+
+
+::::::::::::::::::::::::::::::::::::: keypoints
+
+- Regression testing ensures that the output of a function remains consistent between test runs.
+- The `pytest` plugin, `snaptol`, can be used to simplify this process and cater for floating point numbers that may
+need tolerances on assertion checks.
+:::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/10-CI.Rmd b/episodes/10-CI.Rmd
index 24839e34..0dcab072 100644
--- a/episodes/10-CI.Rmd
+++ b/episodes/10-CI.Rmd
@@ -1,7 +1,7 @@
---
title: "Continuous Integration with GitHub Actions"
-teaching: 20
-exercises: 25
+teaching: 10
+exercises: 2
---
:::::::::::::::::::::::::::::::::::::: questions
@@ -20,61 +20,39 @@ exercises: 25
## Continuous Integration
-Continuous Integration (CI) is the practice of automating the merging of code
-changes into a project. In the context of software testing, CI is the practice
-of running tests on every code change to ensure that the code is working as
-expected. GitHub provides a feature called GitHub Actions that allows you to
-integrate this into your projects.
+Continuous Integration (CI) is the practice of automating the merging of code changes into a project.
+In the context of software testing, CI is the practice of running tests on every code change to ensure that the code is working as expected.
+GitHub provides a feature called GitHub Actions that allows you to integrate this into your projects.
-In this lesson we will go over the basics of how to set up a GitHub Action
-to run tests on your code.
-
-:::::: prereq
-
-This lesson assumes a working knowledge of Git and GitHub. If you get stuck,
-you may find it helpful to review the Research Coding Course's
-[material on version control](https://researchcodingclub.github.io/course/#version-control-introduction-to-git-and-github)
-
-:::::::::::::
+In this lesson we will go over the very basics of how to set up a GitHub Action to run tests on your code.
## Setting up your project repository
-- Create a new repository on GitHub for this lesson called
- "python-testing-course" (whatever you like really). We
- recommended making it public for now.
-- Clone the repository into your local machine using `git clone
- ` or via Github Desktop.
+- Create a new repository on GitHub for this lesson called "python-testing-course" (whatever you like really)
+- Clone the repository into your local machine using `git clone ` or GitKraken if you use that.
- Move over all your code from the previous lessons into this repository.
- Commit the changes using `git add .` and `git commit -m "Add all the project code"`
-- Create a new file called `requirements.txt` in the root of your repository
- and add the following contents:
+- Create a new file called `requirements.txt` in the root of your repository and add the following contents:
```
pytest
numpy
-snaptol
+pandas
+pytest-mpl
+pytest-regtest
+matplotlib
```
-This is just a list of all the packages that your project uses and will be
-needed later. Recall that each of these are used in various lessons in this
-course.
-
-:::::: callout
+This is just a list of all the packages that your project uses and will be needed later.
+Recall that each of these are used in various lessons in this course.
-Nowadays it is usually preferable to list dependencies in a file called
-`pyproject.toml`, which also allows Python packages to be installed and
-published. Look out for our upcoming course on reproducible environments to
-learn more!
-
-::::::::::::::
Now we have a repository with all our code in it online on GitHub.
## Creating a GitHub Action
-GitHub Actions are defined in `yaml` files -- a structured text file which is
-commonly used to pass settings to programs. They are stored in the
-`.github/workflows` directory in your repository.
+GitHub Actions are defined in `yaml` files (these are just simple text files that contain a list of instructions). They are stored
+in the `.github/workflows` directory in your repository.
- Create a new directory in your repository called `.github`
- Inside the `.github` directory, create a new directory called `workflows`
@@ -88,264 +66,93 @@ Let's add some instructions to the `tests.yaml` file:
# This is just the name of the action, you can call it whatever you like.
name: Tests (pytest)
-# This sets the events that trigger the action. In this case, we are telling
-# GitHub to run the tests whenever a push is made to the repository.
-# The trailing colon is intentional!
+# This is the event that triggers the action. In this case, we are telling GitHub to run the tests whenever a pull request is made to the main branch.
on:
- push:
+ pull_request:
+ branches:
+ - main
-# This is a list of jobs that the action will run. In this case, we have only
-# one job called test.
+# This is a list of jobs that the action will run. In this case, we have only one job called build.
jobs:
-
- # This is the name of the job
- test:
-
- # This is the environment that the job will run on. In this case, we are
- # using the latest version of Ubuntu, however you can use other operating
- # systems like Windows or MacOS if you like!
- runs-on: ubuntu-latest
-
- # This is a list of steps that the job will run. Each step is a command
- # that will be executed on the environment.
- steps:
-
- # This command tells GitHub to use a pre-built action. In this case, we
- # are using the actions/checkout action to check out the repository. This
- # just means that GitHub will clone this repository to the current
- # working directory.
- - uses: actions/checkout@v6
-
- # This is the name of the step. This is just a label that will be
- # displayed in the GitHub UI.
- - name: Set up Python 3.12
- # This command tells GitHub to use a pre-built action. In this case, we
- # are using the actions/setup-python action to set up Python 3.12.
- uses: actions/setup-python@v6
- with:
- python-version: "3.12"
-
- # This step installs the dependencies for the project such as pytest,
- # numpy, pandas, etc using the requirements.txt file we created earlier.
- - name: Install dependencies
- run: |
- python -m pip install --upgrade pip
- pip install -r requirements.txt
-
- # This step runs the tests using the pytest command.
- - name: Run tests
- run: |
- pytest
+ build:
+ # This is the environment that the job will run on. In this case, we are using the latest version of Ubuntu, however you can ues other operating systems like Windows or MacOS if you like!
+ runs-on: ubuntu-latest
+
+ # This is a list of steps that the job will run. Each step is a command that will be executed on the environment.
+ steps:
+ # This command tells GitHub to use a pre-built action. In this case, we are using the actions/checkout action to check out the repository. This just means that GitHub will use this repository's code to run the tests.
+ - uses: actions/checkout@v3 # Check out the repository on github
+ # This is the name of the step. This is just a label that will be displayed in the GitHub UI.
+ - name: Set up Python 3.10
+ # This command tells GitHub to use a pre-built action. In this case, we are using the actions/setup-python action to set up Python 3.10.
+ uses: actions/setup-python@v3
+ with:
+ python-version: "3.10"
+
+ # This step installs the dependencies for the project such as pytest, numpy, pandas, etc using the requirements.txt file we created earlier.
+ - name: Install dependencies
+ run: |
+ python -m pip install --upgrade pip
+ pip install -r requirements.txt
+
+ # This step runs the tests using the pytest command. Remember to use the --mpl and --regtest flags to run the tests that use matplotlib and pytest-regtest.
+ - name: Run tests
+ run: |
+ pytest --mpl --regtest
```
-This is a simple GitHub Action that runs the tests for your code whenever code
-is pushed to the repository, regardless of what was changed in the repository
-or which branch you push too. We'll see later how to run tests only when
-certain criteria are fulfilled.
+This is a simple GitHub Action that runs the tests for your code whenever a pull request is made to the main branch.
## Upload the workflow to GitHub
Now that you have created the `tests.yaml` file, you need to upload it to GitHub.
- Commit the changes using `git add .` and `git commit -m "Add GitHub Action to run tests"`
-- Push the changes to GitHub using `git push`
-
-This should trigger a workflow on the repository. While it's running, you'll see an orange
-circle next to your profile name at the top of the repo. When it's done, it'll change to
-a green tick if it finished successfully, or a red cross if it didn't.
-
-{alt="GitHub repository view with a green tick indicating a successful workflow run"}
-
-You can view all previous workflow runs by clicking the 'Actions' button on the
-top bar of your repository.
-
-{alt="GitHub Actions Button"}
-
-If you click on the orange circle/green tick/red cross, you can also view the
-individual stages of the workflow and inspect the terminal output.
-
-{alt="Detailed view of a GitHub workflow run"}
-
-
-## Testing across multiple platforms
-
-A very useful feature of GitHub Actions is the ability to test over a wider
-range of platforms than just your own machine:
-
-- Operating systems
-- Python versions
-- Compiler versions (for those writing C/C++/Fortran/etc)
-
-We can achieve this by setting `jobs..strategy.matrix` in our workflow:
-
-```yaml
-jobs:
- test:
- strategy:
- matrix:
- python_version: ["3.12", "3.13", "3.14"]
- os: ["ubuntu-latest", "windows-latest"]
- runs-on: ${{ matrix.os }}
- steps:
- ...
-```
-
-Later in the file, the `setup-python` step should be changed to:
-
-```yaml
- - name: Set up Python ${{ matrix.python_version }}
- uses: actions/setup-python@v6
- with:
- python-version: ${{ matrix.python_version }}
-```
-
-By default, all combinations in the matrix will be run in separate jobs. The
-syntax `${{ matrix.x }}` inserts the text from the `x` list for the given matrix job.
-
-::::::::::::::::::::::::::::::::::::: challenge
-
-## Upgrade the workflow to run across multiple platforms
-
-- Make the changes above to your workflow file, being careful to get the indentation right!
-- Commit the changes and push to GitHub.
-- Check the latest jobs in the Actions panel.
-
-:::::::::::::::::::::::: solution
+- Push the changes to GitHub using `git push origin main`
-You should see that a total of 6 jobs have run, and hopefully all will have passed!
+## Enable running the tests on a Pull Request
-{alt="Completed matrix tests."}
+The typical use-case for a CI system is to run the tests whenever a pull request is made to the main branch to add a feature.
-:::::::::::::::::::::::::::::::::
+
+- Go to your GitHub repository
+- Click on the "Settings" tab
+- Scroll down to "Branches"
+- Under "Branch protection rules" / "Branch name pattern" type "main"
+- Select the checkbox for "Require status checks to pass before merging"
+- Select the checkbox for "Require branches to be up to date before merging"
-::::::::::::::::::::::::::::::::::::::::::::::::
-
-
-This ensures that code that runs on your machine should, in theory, run on many
-other peoples' machines too. However, it's best to restrict the matrix to the
-minimum number of necessary platforms to ensure you don't waste resources. You
-can do so with a list of exclusions:
-
-```yaml
- strategy:
- matrix:
- python_version: ["3.12", "3.13", "3.14"]
- os: ["ubuntu-latest", "windows-latest"]
- # Only run windows on latest Python version
- exclude:
- - os: "windows-latest"
- python_version: "3.12"
- - os: "windows-latest"
- python_version: "3.13"
-````
-
-## Running on other events
-
-You may have wondered why there is a trailing colon when we specify `push:` at
-the top of the file. The reason is that we can optionally set additional
-conditions on when CI jobs will run. For example:
-
-```yaml
-on:
- push:
- # Only check when Python files are changed.
- # Don't need to check when the README is updated!
- paths:
- - '**.py'
- - 'pyproject.toml'
- # Only check when somebody raises a push to main.
- # (not recommended in general!)
- branches: [main]
-```
-
-Doing this can prevent pointless CI jobs from running and save resources.
-
-You can also run on events other than a push. For example:
-
-```yaml
-on:
- push:
- paths:
- - '**.py'
- - 'pyproject.toml'
- # Run on code in pull requests.
- pull_request:
- paths:
- - '**.py'
- - 'pyproject.toml'
- # This allows you to launch the job manually
- workflow_dispatch:
-```
+This makes it so when a Pull Request is made, trying to merge code into main, it will need to have all of its tests passing
+before the code can be merged.
-There is an important subtlety to running on `pull_request` versus
-`push`:
+Let's test it out.
-- `push` runs directly on the commits you push to GitHub.
-- `pull_request` runs on the code that would result _after_ the pull request
- has been merged into its target branch.
-
-In collaborative coding projects, it is entirely possible that `main` will have
-diverged from your branch while you were working on it, and tests that pass on
-your branch will fail after the merge. For this reason, it's recommended to
-always include both `push` and `pull_request` in your testing workflows.
-
-::::::::::::::::::::::::::::::::::::: challenge
-
-## Running on pull requests (advanced)
-
-Can you engineer a situation where a CI job passes on `push` but
-fails on `pull_request`?
-
-- Write a function to a new file, commit the changes, and push it to your `main`
- branch. It can be something as simple as:
+- Create a new branch in your repository called `subtract` using `git checkout -b subtract`
+- Add a new function in your `calculator.py` file that subtracts two numbers, but make it wrong on purpose:
```python
-# file: message.py
-
-def message():
- return "foo"
-````
-
-- Switch to a new branch `my_branch` with `git switch -c my_branch`,
- and write a test for that function in a new file:
-
-```python
-# file: test_message.py
-from message import message
-
-def test_message():
- assert message() == "foo"
+def subtract(a, b):
+ return a + b
```
-- Check that the test passes, and commit it.
-- Push `my_branch` to GitHub with `git push -u origin my_branch`,
- but don't raise a pull request yet.
-- Return to your `main` branch, and modify the function being tested:
+- Then add a test for this function in your `test_calculator.py` file:
```python
-# file: message.py
-
-def message():
- return "bar"
+def test_subtract():
+ assert subtract(5, 3) == 2
```
-- Push the changes to `main`.
-- Now raise a pull request from `my_branch` into `main`.
-
-:::::::::::::::::::::::: solution
+- Commit the changes using `git add .` and `git commit -m "Add subtract function"`
+- Push the changes to GitHub using `git push origin subtract`
-The code on the new branch will be testing the old implementation,
-and should pass. However, following the merge, the test would fail.
-This results in the `push` test passing, and the `pull_request` test
-failing.
+- Now go to your GitHub repository and create a new Pull Request to merge the `subtract` branch into `main`
-{alt="Example of tests failing on pull requests."}
-
-:::::::::::::::::::::::::::::::::
-
-::::::::::::::::::::::::::::::::::::::::::::::::
+You should see that the GitHub Action runs the tests and fails because the test for the `subtract` function is failing.
-## Keypoints
+- Let's now fix the test and commit the changes: `git add .` and `git commit -m "Fix subtract function"`
+- Push the changes to GitHub using `git push origin subtract` again
+- Go back to the Pull Request on GitHub and you should see that the tests are now passing and you can merge the code into the main branch.
So now, when you or your team want to make a feature or just update the code, the workflow is as follows:
@@ -364,7 +171,7 @@ This will greatly improve the quality of your code and make it easier to collabo
- Continuous Integration (CI) is the practice of automating the merging of code changes into a project.
- GitHub Actions is a feature of GitHub that allows you to automate the testing of your code.
- GitHub Actions are defined in `yaml` files and are stored in the `.github/workflows` directory in your repository.
-- You can use GitHub Actions to ensure your tests pass before merging new code into your `main` branch.
+- You can use GitHub Actions to only allow code to be merged into the main branch if the tests pass.
::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/fig/github_action.png b/episodes/fig/github_action.png
deleted file mode 100644
index 9653d3db..00000000
Binary files a/episodes/fig/github_action.png and /dev/null differ
diff --git a/episodes/fig/github_actions_button.png b/episodes/fig/github_actions_button.png
deleted file mode 100644
index ca6bc515..00000000
Binary files a/episodes/fig/github_actions_button.png and /dev/null differ
diff --git a/episodes/fig/github_repo_view.png b/episodes/fig/github_repo_view.png
deleted file mode 100644
index a314c6a8..00000000
Binary files a/episodes/fig/github_repo_view.png and /dev/null differ
diff --git a/episodes/fig/matrix_tests.png b/episodes/fig/matrix_tests.png
deleted file mode 100644
index d2f47f93..00000000
Binary files a/episodes/fig/matrix_tests.png and /dev/null differ
diff --git a/episodes/fig/pull_request_test_failed.png b/episodes/fig/pull_request_test_failed.png
deleted file mode 100644
index 97e385e0..00000000
Binary files a/episodes/fig/pull_request_test_failed.png and /dev/null differ
diff --git a/learners/files/03-interacting-with-tests/advanced/advanced_calculator.py b/learners/files/03-interacting-with-tests.Rmd copy/advanced/advanced_calculator.py
similarity index 100%
rename from learners/files/03-interacting-with-tests/advanced/advanced_calculator.py
rename to learners/files/03-interacting-with-tests.Rmd copy/advanced/advanced_calculator.py
diff --git a/learners/files/03-interacting-with-tests/advanced/test_advanced_calculator.py b/learners/files/03-interacting-with-tests.Rmd copy/advanced/test_advanced_calculator.py
similarity index 100%
rename from learners/files/03-interacting-with-tests/advanced/test_advanced_calculator.py
rename to learners/files/03-interacting-with-tests.Rmd copy/advanced/test_advanced_calculator.py
diff --git a/learners/files/03-interacting-with-tests/calculator.py b/learners/files/03-interacting-with-tests.Rmd copy/calculator.py
similarity index 100%
rename from learners/files/03-interacting-with-tests/calculator.py
rename to learners/files/03-interacting-with-tests.Rmd copy/calculator.py
diff --git a/learners/files/03-interacting-with-tests/test_calculator.py b/learners/files/03-interacting-with-tests.Rmd copy/test_calculator.py
similarity index 100%
rename from learners/files/03-interacting-with-tests/test_calculator.py
rename to learners/files/03-interacting-with-tests.Rmd copy/test_calculator.py
diff --git a/learners/files/06-floating-point-data/advanced/advanced_calculator.py b/learners/files/06-data-structures/advanced/advanced_calculator.py
similarity index 100%
rename from learners/files/06-floating-point-data/advanced/advanced_calculator.py
rename to learners/files/06-data-structures/advanced/advanced_calculator.py
diff --git a/learners/files/06-floating-point-data/advanced/test_advanced_calculator.py b/learners/files/06-data-structures/advanced/test_advanced_calculator.py
similarity index 100%
rename from learners/files/06-floating-point-data/advanced/test_advanced_calculator.py
rename to learners/files/06-data-structures/advanced/test_advanced_calculator.py
diff --git a/learners/files/06-floating-point-data/calculator.py b/learners/files/06-data-structures/calculator.py
similarity index 100%
rename from learners/files/06-floating-point-data/calculator.py
rename to learners/files/06-data-structures/calculator.py
diff --git a/learners/files/06-data-structures/data_structures.py b/learners/files/06-data-structures/data_structures.py
new file mode 100644
index 00000000..df39e65e
--- /dev/null
+++ b/learners/files/06-data-structures/data_structures.py
@@ -0,0 +1,2 @@
+import numpy as np
+import pandas as pd
diff --git a/learners/files/06-floating-point-data/scripts.py b/learners/files/06-data-structures/scripts.py
similarity index 100%
rename from learners/files/06-floating-point-data/scripts.py
rename to learners/files/06-data-structures/scripts.py
diff --git a/learners/files/06-data-structures/statistics/stats.py b/learners/files/06-data-structures/statistics/stats.py
new file mode 100644
index 00000000..93eea5d3
--- /dev/null
+++ b/learners/files/06-data-structures/statistics/stats.py
@@ -0,0 +1,138 @@
+import numpy as np
+import pandas as pd
+
+import random
+
+
+def sample_participants(participants: list, sample_size: int):
+ indexes = random.sample(range(len(participants)), sample_size)
+ sampled_participants = []
+ for i in indexes:
+ sampled_participants.append(participants[i])
+ return sampled_participants
+
+
+def filter_participants_by_age(participants: list, min_age: int, max_age: int):
+ filtered_participants = []
+ for participant in participants:
+ if participant["age"] >= min_age and participant["age"] <= max_age:
+ filtered_participants.append(participant)
+ return filtered_participants
+
+
+def filter_participants_by_height(participants: list, min_height: int, max_height: int):
+ filtered_participants = []
+ for participant in participants:
+ if participant["height"] >= min_height and participant["height"] <= max_height:
+ filtered_participants.append(participant)
+ return filtered_participants
+
+
+def randomly_sample_and_filter_participants(
+ participants: list, sample_size: int, min_age: int, max_age: int, min_height: int, max_height: int
+):
+ sampled_participants = sample_participants(participants, sample_size)
+ age_filtered_participants = filter_participants_by_age(sampled_participants, min_age, max_age)
+ height_filtered_participants = filter_participants_by_height(age_filtered_participants, min_height, max_height)
+ return height_filtered_participants
+
+
+def remove_anomalies(data: list, maximum_value: float, minimum_value: float) -> list:
+ """Remove anomalies from a list of numbers"""
+
+ result = []
+
+ for value in data:
+ if minimum_value <= value <= maximum_value:
+ result.append(value)
+
+ return result
+
+
+def calculate_frequency(data: list) -> dict:
+ """Calculate the frequency of each element in a list"""
+
+ frequencies = {}
+
+ # Iterate over each value in the list
+ for value in data:
+ # If the value is already in the dictionary, increment the count
+ if value in frequencies:
+ frequencies[value] += 1
+ # Otherwise, add the value to the dictionary with a count of 1
+ else:
+ frequencies[value] = 1
+
+ return frequencies
+
+
+def calculate_cumulative_sum(array: np.ndarray) -> np.ndarray:
+ """Calculate the cumulative sum of a numpy array"""
+
+ # don't use the built-in numpy function
+ result = np.zeros(array.shape)
+ result[0] = array[0]
+ for i in range(1, len(array)):
+ result[i] = result[i - 1] + array[i]
+
+ return result
+
+
+def calculate_player_total_scores(participants: dict):
+ """Calculate the total score of each player in a dictionary.
+
+ Example input:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3])
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6])
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9])
+ },
+ }
+
+ Example output:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3]),
+ "total_score": 6
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6]),
+ "total_score": 15
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9]),
+ "total_score": 24
+ },
+ }
+ """
+
+ for player in participants:
+ participants[player]["total_score"] = np.sum(participants[player]["scores"])
+
+ return participants
+
+
+def calculate_player_average_scores(df: pd.DataFrame) -> pd.DataFrame:
+ """Calculate the average score of each player in a pandas DataFrame.
+
+ Example input:
+ | | player | score_1 | score_2 |
+ |---|---------|---------|---------|
+ | 0 | Alice | 1 | 2 |
+ | 1 | Bob | 3 | 4 |
+
+ Example output:
+ | | player | score_1 | score_2 | average_score |
+ |---|---------|---------|---------|---------------|
+ | 0 | Alice | 1 | 2 | 1.5 |
+ | 1 | Bob | 3 | 4 | 3.5 |
+ """
+
+ df["average_score"] = df[["score_1", "score_2"]].mean(axis=1)
+
+ return df
diff --git a/learners/files/06-floating-point-data/statistics/test_stats.py b/learners/files/06-data-structures/statistics/test_stats.py
similarity index 56%
rename from learners/files/06-floating-point-data/statistics/test_stats.py
rename to learners/files/06-data-structures/statistics/test_stats.py
index fd761486..573902a9 100644
--- a/learners/files/06-floating-point-data/statistics/test_stats.py
+++ b/learners/files/06-data-structures/statistics/test_stats.py
@@ -1,8 +1,16 @@
+import numpy as np
+import pandas as pd
+
from stats import (
sample_participants,
filter_participants_by_age,
filter_participants_by_height,
randomly_sample_and_filter_participants,
+ remove_anomalies,
+ calculate_frequency,
+ calculate_cumulative_sum,
+ calculate_player_total_scores,
+ calculate_player_average_scores,
)
import random
@@ -80,3 +88,50 @@ def test_randomly_sample_and_filter_participants():
)
expected = [{"age": 38, "height": 165}, {"age": 30, "height": 170}, {"age": 35, "height": 160}]
assert filtered_participants == expected
+
+
+def test_remove_anomalies():
+ """Test remove_anomalies function"""
+ data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ maximum_value = 5
+ minimum_value = 2
+ expected_result = [2, 3, 4, 5]
+ assert remove_anomalies(data, maximum_value, minimum_value) == expected_result
+
+
+def test_calculate_frequency():
+ """Test calculate_frequency function"""
+ data = [1, 2, 3, 1, 2, 1, 1, 3, 3, 3]
+ expected_result = {1: 4, 2: 2, 3: 4}
+ assert calculate_frequency(data) == expected_result
+
+
+def test_calculate_cumulative_sum():
+ """Test calculate_cumulative_sum function"""
+ array = np.array([1, 2, 3, 4, 5])
+ expected_result = np.array([1, 3, 6, 10, 15])
+ np.testing.assert_array_equal(calculate_cumulative_sum(array), expected_result)
+
+
+def test_calculate_player_total_scores():
+ """Test calculate_player_total_scores function"""
+ participants = {
+ "Alice": {"scores": np.array([1, 2, 3])},
+ "Bob": {"scores": np.array([4, 5, 6])},
+ "Charlie": {"scores": np.array([7, 8, 9])},
+ }
+ expected_result = {
+ "Alice": {"scores": np.array([1, 2, 3]), "total_score": 6},
+ "Bob": {"scores": np.array([4, 5, 6]), "total_score": 15},
+ "Charlie": {"scores": np.array([7, 8, 9]), "total_score": 24},
+ }
+ np.testing.assert_equal(calculate_player_total_scores(participants), expected_result)
+
+
+def test_calculate_player_average_scores():
+ """Test calculate_player_average_scores function"""
+ df = pd.DataFrame({"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4]})
+ expected_result = pd.DataFrame(
+ {"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4], "average_score": [1.5, 3.5]}
+ )
+ pd.testing.assert_frame_equal(calculate_player_average_scores(df), expected_result)
diff --git a/learners/files/06-floating-point-data/test_calculator.py b/learners/files/06-data-structures/test_calculator.py
similarity index 100%
rename from learners/files/06-floating-point-data/test_calculator.py
rename to learners/files/06-data-structures/test_calculator.py
diff --git a/learners/files/06-data-structures/test_data_structures.py b/learners/files/06-data-structures/test_data_structures.py
new file mode 100644
index 00000000..00f3cd2d
--- /dev/null
+++ b/learners/files/06-data-structures/test_data_structures.py
@@ -0,0 +1,123 @@
+import numpy as np
+import pandas as pd
+
+
+def test_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert list1 == list2
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert list3 != list4
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert list5 != list6
+
+
+def test_sorted_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert sorted(list1) == sorted(list2)
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert sorted(list3) == sorted(list4)
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert sorted(list5) != sorted(list6)
+
+
+def test_dictionaries_equal():
+ """Test that dictionaries are equal"""
+ # Create two dictionaries
+ dict1 = {"a": 1, "b": 2, "c": 3}
+ dict2 = {"a": 1, "b": 2, "c": 3}
+ # Check that the dictionaries are equal
+ assert dict1 == dict2
+
+ # Create two dictionaries, different order
+ dict3 = {"a": 1, "b": 2, "c": 3}
+ dict4 = {"c": 3, "b": 2, "a": 1}
+ assert dict3 == dict4
+
+ # Create two different dictionaries
+ dict5 = {"a": 1, "b": 2, "c": 3}
+ dict6 = {"a": 1, "b": 2, "c": 4}
+ # Check that the dictionaries are not equal
+ assert dict5 != dict6
+
+
+def test_numpy_arrays():
+ """Test that numpy arrays are equal"""
+ # Create two numpy arrays
+ array1 = np.array([1, 2, 3])
+ array2 = np.array([1, 2, 3])
+ # Check that the arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_nested_numpy_arrays():
+ """Test that nested numpy arrays are equal"""
+ # Create two nested numpy arrays
+ array1 = np.array([[1, 2], [3, 4]])
+ array2 = np.array([[1, 2], [3, 4]])
+ # Check that the nested arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_numpy_arrays_with_tolerance():
+ """Test that numpy arrays are equal with tolerance"""
+ # Create two numpy arrays
+ array1 = np.array([1.0, 2.0, 3.0])
+ array2 = np.array([1.00009, 2.0005, 3.0001])
+ # Check that the arrays are equal with tolerance
+ np.testing.assert_allclose(array1, array2, atol=1e-3)
+
+
+def test_dictionaries_with_numpy_arrays():
+ """Test that dictionaries with numpy arrays are equal"""
+ # Create two dictionaries with numpy arrays
+ dict1 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict2 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ # Check that the dictionaries are equal
+ np.testing.assert_equal(dict1, dict2)
+
+ # Create two dictionaries with different numpy arrays
+ dict3 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict4 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 7])}
+ # Check that the dictionaries are not equal
+ with np.testing.assert_raises(AssertionError):
+ np.testing.assert_equal(dict3, dict4)
+
+
+def test_pandas_dataframes():
+ """Test that pandas DataFrames are equal"""
+ # Create two pandas DataFrames
+ df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ df2 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ # Check that the DataFrames are equal
+ pd.testing.assert_frame_equal(df1, df2)
+
+
+def test_pandas_series():
+ """Test that pandas Series are equal"""
+ # Create two pandas Series
+ s1 = pd.Series([1, 2, 3])
+ s2 = pd.Series([1, 2, 3])
+ # Check that the Series are equal
+ pd.testing.assert_series_equal(s1, s2)
diff --git a/learners/files/06-floating-point-data/estimate_pi.py b/learners/files/06-floating-point-data/estimate_pi.py
deleted file mode 100644
index 4f1bd6ba..00000000
--- a/learners/files/06-floating-point-data/estimate_pi.py
+++ /dev/null
@@ -1,10 +0,0 @@
-import random
-
-def estimate_pi(iterations):
- num_inside = 0
- for _ in range(iterations):
- x = random.random()
- y = random.random()
- if x**2 + y**2 < 1:
- num_inside += 1
- return 4 * num_inside / iterations
diff --git a/learners/files/06-floating-point-data/statistics/stats.py b/learners/files/06-floating-point-data/statistics/stats.py
deleted file mode 100644
index 581a3791..00000000
--- a/learners/files/06-floating-point-data/statistics/stats.py
+++ /dev/null
@@ -1,34 +0,0 @@
-import random
-
-
-def sample_participants(participants: list, sample_size: int):
- indexes = random.sample(range(len(participants)), sample_size)
- sampled_participants = []
- for i in indexes:
- sampled_participants.append(participants[i])
- return sampled_participants
-
-
-def filter_participants_by_age(participants: list, min_age: int, max_age: int):
- filtered_participants = []
- for participant in participants:
- if participant["age"] >= min_age and participant["age"] <= max_age:
- filtered_participants.append(participant)
- return filtered_participants
-
-
-def filter_participants_by_height(participants: list, min_height: int, max_height: int):
- filtered_participants = []
- for participant in participants:
- if participant["height"] >= min_height and participant["height"] <= max_height:
- filtered_participants.append(participant)
- return filtered_participants
-
-
-def randomly_sample_and_filter_participants(
- participants: list, sample_size: int, min_age: int, max_age: int, min_height: int, max_height: int
-):
- sampled_participants = sample_participants(participants, sample_size)
- age_filtered_participants = filter_participants_by_age(sampled_participants, min_age, max_age)
- height_filtered_participants = filter_participants_by_height(age_filtered_participants, min_height, max_height)
- return height_filtered_participants
diff --git a/learners/files/06-floating-point-data/test_estimate_pi.py b/learners/files/06-floating-point-data/test_estimate_pi.py
deleted file mode 100644
index a40b018d..00000000
--- a/learners/files/06-floating-point-data/test_estimate_pi.py
+++ /dev/null
@@ -1,12 +0,0 @@
-import math
-import random
-
-from estimate_pi import estimate_pi
-
-def test_estimate_pi():
- random.seed(0)
- expected = 3.141592654
- actual = estimate_pi(iterations=10000)
- atol = 1e-2
- rtol = 5e-3
- assert math.isclose(actual, expected, abs_tol=atol, rel_tol=rtol)
diff --git a/learners/files/06-floating-point-data/test_floating_point.py b/learners/files/06-floating-point-data/test_floating_point.py
deleted file mode 100644
index c11c0349..00000000
--- a/learners/files/06-floating-point-data/test_floating_point.py
+++ /dev/null
@@ -1,12 +0,0 @@
-from math import fabs
-from random import random
-
-def estimate_pi(iterations):
- num_inside = 0
- for _ in range(iterations):
- x = random()
- y = random()
- if x**2 + y**2 < 1:
- num_inside += 1
- return 4 * num_inside / iterations
-
diff --git a/learners/files/06-floating-point-data/test_numpy.py b/learners/files/06-floating-point-data/test_numpy.py
deleted file mode 100644
index 0eab737a..00000000
--- a/learners/files/06-floating-point-data/test_numpy.py
+++ /dev/null
@@ -1,27 +0,0 @@
-import numpy as np
-
-def test_numpy_arrays():
- """Test that numpy arrays are equal"""
- # Create two numpy arrays
- array1 = np.array([1, 2, 3])
- array2 = np.array([1, 2, 3])
- # Check that the arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_2d_numpy_arrays():
- """Test that 2d numpy arrays are equal"""
- # Create two 2d numpy arrays
- array1 = np.array([[1, 2], [3, 4]])
- array2 = np.array([[1, 2], [3, 4]])
- # Check that the nested arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_numpy_arrays_with_tolerance():
- """Test that numpy arrays are equal with tolerance"""
- # Create two numpy arrays
- array1 = np.array([1.0, 2.0, 3.0])
- array2 = np.array([1.00009, 2.0005, 3.0001])
- # Check that the arrays are equal with tolerance
- np.testing.assert_allclose(array1, array2, atol=1e-3)
diff --git a/learners/files/07-fixtures/data_structures.py b/learners/files/07-fixtures/data_structures.py
new file mode 100644
index 00000000..df39e65e
--- /dev/null
+++ b/learners/files/07-fixtures/data_structures.py
@@ -0,0 +1,2 @@
+import numpy as np
+import pandas as pd
diff --git a/learners/files/07-fixtures/estimate_pi.py b/learners/files/07-fixtures/estimate_pi.py
deleted file mode 100644
index 4f1bd6ba..00000000
--- a/learners/files/07-fixtures/estimate_pi.py
+++ /dev/null
@@ -1,10 +0,0 @@
-import random
-
-def estimate_pi(iterations):
- num_inside = 0
- for _ in range(iterations):
- x = random.random()
- y = random.random()
- if x**2 + y**2 < 1:
- num_inside += 1
- return 4 * num_inside / iterations
diff --git a/learners/files/07-fixtures/statistics/stats.py b/learners/files/07-fixtures/statistics/stats.py
index 581a3791..93eea5d3 100644
--- a/learners/files/07-fixtures/statistics/stats.py
+++ b/learners/files/07-fixtures/statistics/stats.py
@@ -1,3 +1,6 @@
+import numpy as np
+import pandas as pd
+
import random
@@ -32,3 +35,104 @@ def randomly_sample_and_filter_participants(
age_filtered_participants = filter_participants_by_age(sampled_participants, min_age, max_age)
height_filtered_participants = filter_participants_by_height(age_filtered_participants, min_height, max_height)
return height_filtered_participants
+
+
+def remove_anomalies(data: list, maximum_value: float, minimum_value: float) -> list:
+ """Remove anomalies from a list of numbers"""
+
+ result = []
+
+ for value in data:
+ if minimum_value <= value <= maximum_value:
+ result.append(value)
+
+ return result
+
+
+def calculate_frequency(data: list) -> dict:
+ """Calculate the frequency of each element in a list"""
+
+ frequencies = {}
+
+ # Iterate over each value in the list
+ for value in data:
+ # If the value is already in the dictionary, increment the count
+ if value in frequencies:
+ frequencies[value] += 1
+ # Otherwise, add the value to the dictionary with a count of 1
+ else:
+ frequencies[value] = 1
+
+ return frequencies
+
+
+def calculate_cumulative_sum(array: np.ndarray) -> np.ndarray:
+ """Calculate the cumulative sum of a numpy array"""
+
+ # don't use the built-in numpy function
+ result = np.zeros(array.shape)
+ result[0] = array[0]
+ for i in range(1, len(array)):
+ result[i] = result[i - 1] + array[i]
+
+ return result
+
+
+def calculate_player_total_scores(participants: dict):
+ """Calculate the total score of each player in a dictionary.
+
+ Example input:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3])
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6])
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9])
+ },
+ }
+
+ Example output:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3]),
+ "total_score": 6
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6]),
+ "total_score": 15
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9]),
+ "total_score": 24
+ },
+ }
+ """
+
+ for player in participants:
+ participants[player]["total_score"] = np.sum(participants[player]["scores"])
+
+ return participants
+
+
+def calculate_player_average_scores(df: pd.DataFrame) -> pd.DataFrame:
+ """Calculate the average score of each player in a pandas DataFrame.
+
+ Example input:
+ | | player | score_1 | score_2 |
+ |---|---------|---------|---------|
+ | 0 | Alice | 1 | 2 |
+ | 1 | Bob | 3 | 4 |
+
+ Example output:
+ | | player | score_1 | score_2 | average_score |
+ |---|---------|---------|---------|---------------|
+ | 0 | Alice | 1 | 2 | 1.5 |
+ | 1 | Bob | 3 | 4 | 3.5 |
+ """
+
+ df["average_score"] = df[["score_1", "score_2"]].mean(axis=1)
+
+ return df
diff --git a/learners/files/07-fixtures/statistics/test_stats.py b/learners/files/07-fixtures/statistics/test_stats.py
index 806c3539..fda2fc89 100644
--- a/learners/files/07-fixtures/statistics/test_stats.py
+++ b/learners/files/07-fixtures/statistics/test_stats.py
@@ -1,3 +1,5 @@
+import numpy as np
+import pandas as pd
import pytest
from stats import (
@@ -5,6 +7,11 @@
filter_participants_by_age,
filter_participants_by_height,
randomly_sample_and_filter_participants,
+ remove_anomalies,
+ calculate_frequency,
+ calculate_cumulative_sum,
+ calculate_player_total_scores,
+ calculate_player_average_scores,
)
import random
@@ -62,3 +69,50 @@ def test_randomly_sample_and_filter_participants(participants):
)
expected = [{"age": 38, "height": 165}, {"age": 30, "height": 170}, {"age": 35, "height": 160}]
assert filtered_participants == expected
+
+
+def test_remove_anomalies():
+ """Test remove_anomalies function"""
+ data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ maximum_value = 5
+ minimum_value = 2
+ expected_result = [2, 3, 4, 5]
+ assert remove_anomalies(data, maximum_value, minimum_value) == expected_result
+
+
+def test_calculate_frequency():
+ """Test calculate_frequency function"""
+ data = [1, 2, 3, 1, 2, 1, 1, 3, 3, 3]
+ expected_result = {1: 4, 2: 2, 3: 4}
+ assert calculate_frequency(data) == expected_result
+
+
+def test_calculate_cumulative_sum():
+ """Test calculate_cumulative_sum function"""
+ array = np.array([1, 2, 3, 4, 5])
+ expected_result = np.array([1, 3, 6, 10, 15])
+ np.testing.assert_array_equal(calculate_cumulative_sum(array), expected_result)
+
+
+def test_calculate_player_total_scores():
+ """Test calculate_player_total_scores function"""
+ participants = {
+ "Alice": {"scores": np.array([1, 2, 3])},
+ "Bob": {"scores": np.array([4, 5, 6])},
+ "Charlie": {"scores": np.array([7, 8, 9])},
+ }
+ expected_result = {
+ "Alice": {"scores": np.array([1, 2, 3]), "total_score": 6},
+ "Bob": {"scores": np.array([4, 5, 6]), "total_score": 15},
+ "Charlie": {"scores": np.array([7, 8, 9]), "total_score": 24},
+ }
+ np.testing.assert_equal(calculate_player_total_scores(participants), expected_result)
+
+
+def test_calculate_player_average_scores():
+ """Test calculate_player_average_scores function"""
+ df = pd.DataFrame({"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4]})
+ expected_result = pd.DataFrame(
+ {"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4], "average_score": [1.5, 3.5]}
+ )
+ pd.testing.assert_frame_equal(calculate_player_average_scores(df), expected_result)
diff --git a/learners/files/07-fixtures/test_data_structures.py b/learners/files/07-fixtures/test_data_structures.py
new file mode 100644
index 00000000..00f3cd2d
--- /dev/null
+++ b/learners/files/07-fixtures/test_data_structures.py
@@ -0,0 +1,123 @@
+import numpy as np
+import pandas as pd
+
+
+def test_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert list1 == list2
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert list3 != list4
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert list5 != list6
+
+
+def test_sorted_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert sorted(list1) == sorted(list2)
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert sorted(list3) == sorted(list4)
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert sorted(list5) != sorted(list6)
+
+
+def test_dictionaries_equal():
+ """Test that dictionaries are equal"""
+ # Create two dictionaries
+ dict1 = {"a": 1, "b": 2, "c": 3}
+ dict2 = {"a": 1, "b": 2, "c": 3}
+ # Check that the dictionaries are equal
+ assert dict1 == dict2
+
+ # Create two dictionaries, different order
+ dict3 = {"a": 1, "b": 2, "c": 3}
+ dict4 = {"c": 3, "b": 2, "a": 1}
+ assert dict3 == dict4
+
+ # Create two different dictionaries
+ dict5 = {"a": 1, "b": 2, "c": 3}
+ dict6 = {"a": 1, "b": 2, "c": 4}
+ # Check that the dictionaries are not equal
+ assert dict5 != dict6
+
+
+def test_numpy_arrays():
+ """Test that numpy arrays are equal"""
+ # Create two numpy arrays
+ array1 = np.array([1, 2, 3])
+ array2 = np.array([1, 2, 3])
+ # Check that the arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_nested_numpy_arrays():
+ """Test that nested numpy arrays are equal"""
+ # Create two nested numpy arrays
+ array1 = np.array([[1, 2], [3, 4]])
+ array2 = np.array([[1, 2], [3, 4]])
+ # Check that the nested arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_numpy_arrays_with_tolerance():
+ """Test that numpy arrays are equal with tolerance"""
+ # Create two numpy arrays
+ array1 = np.array([1.0, 2.0, 3.0])
+ array2 = np.array([1.00009, 2.0005, 3.0001])
+ # Check that the arrays are equal with tolerance
+ np.testing.assert_allclose(array1, array2, atol=1e-3)
+
+
+def test_dictionaries_with_numpy_arrays():
+ """Test that dictionaries with numpy arrays are equal"""
+ # Create two dictionaries with numpy arrays
+ dict1 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict2 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ # Check that the dictionaries are equal
+ np.testing.assert_equal(dict1, dict2)
+
+ # Create two dictionaries with different numpy arrays
+ dict3 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict4 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 7])}
+ # Check that the dictionaries are not equal
+ with np.testing.assert_raises(AssertionError):
+ np.testing.assert_equal(dict3, dict4)
+
+
+def test_pandas_dataframes():
+ """Test that pandas DataFrames are equal"""
+ # Create two pandas DataFrames
+ df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ df2 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ # Check that the DataFrames are equal
+ pd.testing.assert_frame_equal(df1, df2)
+
+
+def test_pandas_series():
+ """Test that pandas Series are equal"""
+ # Create two pandas Series
+ s1 = pd.Series([1, 2, 3])
+ s2 = pd.Series([1, 2, 3])
+ # Check that the Series are equal
+ pd.testing.assert_series_equal(s1, s2)
diff --git a/learners/files/07-fixtures/test_estimate_pi.py b/learners/files/07-fixtures/test_estimate_pi.py
deleted file mode 100644
index a40b018d..00000000
--- a/learners/files/07-fixtures/test_estimate_pi.py
+++ /dev/null
@@ -1,12 +0,0 @@
-import math
-import random
-
-from estimate_pi import estimate_pi
-
-def test_estimate_pi():
- random.seed(0)
- expected = 3.141592654
- actual = estimate_pi(iterations=10000)
- atol = 1e-2
- rtol = 5e-3
- assert math.isclose(actual, expected, abs_tol=atol, rel_tol=rtol)
diff --git a/learners/files/07-fixtures/test_numpy.py b/learners/files/07-fixtures/test_numpy.py
deleted file mode 100644
index 0eab737a..00000000
--- a/learners/files/07-fixtures/test_numpy.py
+++ /dev/null
@@ -1,27 +0,0 @@
-import numpy as np
-
-def test_numpy_arrays():
- """Test that numpy arrays are equal"""
- # Create two numpy arrays
- array1 = np.array([1, 2, 3])
- array2 = np.array([1, 2, 3])
- # Check that the arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_2d_numpy_arrays():
- """Test that 2d numpy arrays are equal"""
- # Create two 2d numpy arrays
- array1 = np.array([[1, 2], [3, 4]])
- array2 = np.array([[1, 2], [3, 4]])
- # Check that the nested arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_numpy_arrays_with_tolerance():
- """Test that numpy arrays are equal with tolerance"""
- # Create two numpy arrays
- array1 = np.array([1.0, 2.0, 3.0])
- array2 = np.array([1.00009, 2.0005, 3.0001])
- # Check that the arrays are equal with tolerance
- np.testing.assert_allclose(array1, array2, atol=1e-3)
diff --git a/learners/files/08-parametrization/data_structures.py b/learners/files/08-parametrization/data_structures.py
new file mode 100644
index 00000000..df39e65e
--- /dev/null
+++ b/learners/files/08-parametrization/data_structures.py
@@ -0,0 +1,2 @@
+import numpy as np
+import pandas as pd
diff --git a/learners/files/08-parametrization/estimate_pi.py b/learners/files/08-parametrization/estimate_pi.py
deleted file mode 100644
index 4f1bd6ba..00000000
--- a/learners/files/08-parametrization/estimate_pi.py
+++ /dev/null
@@ -1,10 +0,0 @@
-import random
-
-def estimate_pi(iterations):
- num_inside = 0
- for _ in range(iterations):
- x = random.random()
- y = random.random()
- if x**2 + y**2 < 1:
- num_inside += 1
- return 4 * num_inside / iterations
diff --git a/learners/files/08-parametrization/statistics/stats.py b/learners/files/08-parametrization/statistics/stats.py
index 581a3791..93eea5d3 100644
--- a/learners/files/08-parametrization/statistics/stats.py
+++ b/learners/files/08-parametrization/statistics/stats.py
@@ -1,3 +1,6 @@
+import numpy as np
+import pandas as pd
+
import random
@@ -32,3 +35,104 @@ def randomly_sample_and_filter_participants(
age_filtered_participants = filter_participants_by_age(sampled_participants, min_age, max_age)
height_filtered_participants = filter_participants_by_height(age_filtered_participants, min_height, max_height)
return height_filtered_participants
+
+
+def remove_anomalies(data: list, maximum_value: float, minimum_value: float) -> list:
+ """Remove anomalies from a list of numbers"""
+
+ result = []
+
+ for value in data:
+ if minimum_value <= value <= maximum_value:
+ result.append(value)
+
+ return result
+
+
+def calculate_frequency(data: list) -> dict:
+ """Calculate the frequency of each element in a list"""
+
+ frequencies = {}
+
+ # Iterate over each value in the list
+ for value in data:
+ # If the value is already in the dictionary, increment the count
+ if value in frequencies:
+ frequencies[value] += 1
+ # Otherwise, add the value to the dictionary with a count of 1
+ else:
+ frequencies[value] = 1
+
+ return frequencies
+
+
+def calculate_cumulative_sum(array: np.ndarray) -> np.ndarray:
+ """Calculate the cumulative sum of a numpy array"""
+
+ # don't use the built-in numpy function
+ result = np.zeros(array.shape)
+ result[0] = array[0]
+ for i in range(1, len(array)):
+ result[i] = result[i - 1] + array[i]
+
+ return result
+
+
+def calculate_player_total_scores(participants: dict):
+ """Calculate the total score of each player in a dictionary.
+
+ Example input:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3])
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6])
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9])
+ },
+ }
+
+ Example output:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3]),
+ "total_score": 6
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6]),
+ "total_score": 15
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9]),
+ "total_score": 24
+ },
+ }
+ """
+
+ for player in participants:
+ participants[player]["total_score"] = np.sum(participants[player]["scores"])
+
+ return participants
+
+
+def calculate_player_average_scores(df: pd.DataFrame) -> pd.DataFrame:
+ """Calculate the average score of each player in a pandas DataFrame.
+
+ Example input:
+ | | player | score_1 | score_2 |
+ |---|---------|---------|---------|
+ | 0 | Alice | 1 | 2 |
+ | 1 | Bob | 3 | 4 |
+
+ Example output:
+ | | player | score_1 | score_2 | average_score |
+ |---|---------|---------|---------|---------------|
+ | 0 | Alice | 1 | 2 | 1.5 |
+ | 1 | Bob | 3 | 4 | 3.5 |
+ """
+
+ df["average_score"] = df[["score_1", "score_2"]].mean(axis=1)
+
+ return df
diff --git a/learners/files/08-parametrization/statistics/test_stats.py b/learners/files/08-parametrization/statistics/test_stats.py
index 806c3539..fda2fc89 100644
--- a/learners/files/08-parametrization/statistics/test_stats.py
+++ b/learners/files/08-parametrization/statistics/test_stats.py
@@ -1,3 +1,5 @@
+import numpy as np
+import pandas as pd
import pytest
from stats import (
@@ -5,6 +7,11 @@
filter_participants_by_age,
filter_participants_by_height,
randomly_sample_and_filter_participants,
+ remove_anomalies,
+ calculate_frequency,
+ calculate_cumulative_sum,
+ calculate_player_total_scores,
+ calculate_player_average_scores,
)
import random
@@ -62,3 +69,50 @@ def test_randomly_sample_and_filter_participants(participants):
)
expected = [{"age": 38, "height": 165}, {"age": 30, "height": 170}, {"age": 35, "height": 160}]
assert filtered_participants == expected
+
+
+def test_remove_anomalies():
+ """Test remove_anomalies function"""
+ data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ maximum_value = 5
+ minimum_value = 2
+ expected_result = [2, 3, 4, 5]
+ assert remove_anomalies(data, maximum_value, minimum_value) == expected_result
+
+
+def test_calculate_frequency():
+ """Test calculate_frequency function"""
+ data = [1, 2, 3, 1, 2, 1, 1, 3, 3, 3]
+ expected_result = {1: 4, 2: 2, 3: 4}
+ assert calculate_frequency(data) == expected_result
+
+
+def test_calculate_cumulative_sum():
+ """Test calculate_cumulative_sum function"""
+ array = np.array([1, 2, 3, 4, 5])
+ expected_result = np.array([1, 3, 6, 10, 15])
+ np.testing.assert_array_equal(calculate_cumulative_sum(array), expected_result)
+
+
+def test_calculate_player_total_scores():
+ """Test calculate_player_total_scores function"""
+ participants = {
+ "Alice": {"scores": np.array([1, 2, 3])},
+ "Bob": {"scores": np.array([4, 5, 6])},
+ "Charlie": {"scores": np.array([7, 8, 9])},
+ }
+ expected_result = {
+ "Alice": {"scores": np.array([1, 2, 3]), "total_score": 6},
+ "Bob": {"scores": np.array([4, 5, 6]), "total_score": 15},
+ "Charlie": {"scores": np.array([7, 8, 9]), "total_score": 24},
+ }
+ np.testing.assert_equal(calculate_player_total_scores(participants), expected_result)
+
+
+def test_calculate_player_average_scores():
+ """Test calculate_player_average_scores function"""
+ df = pd.DataFrame({"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4]})
+ expected_result = pd.DataFrame(
+ {"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4], "average_score": [1.5, 3.5]}
+ )
+ pd.testing.assert_frame_equal(calculate_player_average_scores(df), expected_result)
diff --git a/learners/files/08-parametrization/test_data_structures.py b/learners/files/08-parametrization/test_data_structures.py
new file mode 100644
index 00000000..00f3cd2d
--- /dev/null
+++ b/learners/files/08-parametrization/test_data_structures.py
@@ -0,0 +1,123 @@
+import numpy as np
+import pandas as pd
+
+
+def test_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert list1 == list2
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert list3 != list4
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert list5 != list6
+
+
+def test_sorted_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert sorted(list1) == sorted(list2)
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert sorted(list3) == sorted(list4)
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert sorted(list5) != sorted(list6)
+
+
+def test_dictionaries_equal():
+ """Test that dictionaries are equal"""
+ # Create two dictionaries
+ dict1 = {"a": 1, "b": 2, "c": 3}
+ dict2 = {"a": 1, "b": 2, "c": 3}
+ # Check that the dictionaries are equal
+ assert dict1 == dict2
+
+ # Create two dictionaries, different order
+ dict3 = {"a": 1, "b": 2, "c": 3}
+ dict4 = {"c": 3, "b": 2, "a": 1}
+ assert dict3 == dict4
+
+ # Create two different dictionaries
+ dict5 = {"a": 1, "b": 2, "c": 3}
+ dict6 = {"a": 1, "b": 2, "c": 4}
+ # Check that the dictionaries are not equal
+ assert dict5 != dict6
+
+
+def test_numpy_arrays():
+ """Test that numpy arrays are equal"""
+ # Create two numpy arrays
+ array1 = np.array([1, 2, 3])
+ array2 = np.array([1, 2, 3])
+ # Check that the arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_nested_numpy_arrays():
+ """Test that nested numpy arrays are equal"""
+ # Create two nested numpy arrays
+ array1 = np.array([[1, 2], [3, 4]])
+ array2 = np.array([[1, 2], [3, 4]])
+ # Check that the nested arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_numpy_arrays_with_tolerance():
+ """Test that numpy arrays are equal with tolerance"""
+ # Create two numpy arrays
+ array1 = np.array([1.0, 2.0, 3.0])
+ array2 = np.array([1.00009, 2.0005, 3.0001])
+ # Check that the arrays are equal with tolerance
+ np.testing.assert_allclose(array1, array2, atol=1e-3)
+
+
+def test_dictionaries_with_numpy_arrays():
+ """Test that dictionaries with numpy arrays are equal"""
+ # Create two dictionaries with numpy arrays
+ dict1 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict2 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ # Check that the dictionaries are equal
+ np.testing.assert_equal(dict1, dict2)
+
+ # Create two dictionaries with different numpy arrays
+ dict3 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict4 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 7])}
+ # Check that the dictionaries are not equal
+ with np.testing.assert_raises(AssertionError):
+ np.testing.assert_equal(dict3, dict4)
+
+
+def test_pandas_dataframes():
+ """Test that pandas DataFrames are equal"""
+ # Create two pandas DataFrames
+ df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ df2 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ # Check that the DataFrames are equal
+ pd.testing.assert_frame_equal(df1, df2)
+
+
+def test_pandas_series():
+ """Test that pandas Series are equal"""
+ # Create two pandas Series
+ s1 = pd.Series([1, 2, 3])
+ s2 = pd.Series([1, 2, 3])
+ # Check that the Series are equal
+ pd.testing.assert_series_equal(s1, s2)
diff --git a/learners/files/08-parametrization/test_estimate_pi.py b/learners/files/08-parametrization/test_estimate_pi.py
deleted file mode 100644
index a40b018d..00000000
--- a/learners/files/08-parametrization/test_estimate_pi.py
+++ /dev/null
@@ -1,12 +0,0 @@
-import math
-import random
-
-from estimate_pi import estimate_pi
-
-def test_estimate_pi():
- random.seed(0)
- expected = 3.141592654
- actual = estimate_pi(iterations=10000)
- atol = 1e-2
- rtol = 5e-3
- assert math.isclose(actual, expected, abs_tol=atol, rel_tol=rtol)
diff --git a/learners/files/08-parametrization/test_numpy.py b/learners/files/08-parametrization/test_numpy.py
deleted file mode 100644
index 0eab737a..00000000
--- a/learners/files/08-parametrization/test_numpy.py
+++ /dev/null
@@ -1,27 +0,0 @@
-import numpy as np
-
-def test_numpy_arrays():
- """Test that numpy arrays are equal"""
- # Create two numpy arrays
- array1 = np.array([1, 2, 3])
- array2 = np.array([1, 2, 3])
- # Check that the arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_2d_numpy_arrays():
- """Test that 2d numpy arrays are equal"""
- # Create two 2d numpy arrays
- array1 = np.array([[1, 2], [3, 4]])
- array2 = np.array([[1, 2], [3, 4]])
- # Check that the nested arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_numpy_arrays_with_tolerance():
- """Test that numpy arrays are equal with tolerance"""
- # Create two numpy arrays
- array1 = np.array([1.0, 2.0, 3.0])
- array2 = np.array([1.00009, 2.0005, 3.0001])
- # Check that the arrays are equal with tolerance
- np.testing.assert_allclose(array1, array2, atol=1e-3)
diff --git a/learners/files/09-testing-output-files/data_structures.py b/learners/files/09-testing-output-files/data_structures.py
new file mode 100644
index 00000000..df39e65e
--- /dev/null
+++ b/learners/files/09-testing-output-files/data_structures.py
@@ -0,0 +1,2 @@
+import numpy as np
+import pandas as pd
diff --git a/learners/files/09-testing-output-files/estimate_pi.py b/learners/files/09-testing-output-files/estimate_pi.py
deleted file mode 100644
index 4f1bd6ba..00000000
--- a/learners/files/09-testing-output-files/estimate_pi.py
+++ /dev/null
@@ -1,10 +0,0 @@
-import random
-
-def estimate_pi(iterations):
- num_inside = 0
- for _ in range(iterations):
- x = random.random()
- y = random.random()
- if x**2 + y**2 < 1:
- num_inside += 1
- return 4 * num_inside / iterations
diff --git a/learners/files/09-testing-output-files/statistics/stats.py b/learners/files/09-testing-output-files/statistics/stats.py
index 8cf18ecb..d6d8ffc7 100644
--- a/learners/files/09-testing-output-files/statistics/stats.py
+++ b/learners/files/09-testing-output-files/statistics/stats.py
@@ -1,3 +1,6 @@
+import numpy as np
+import pandas as pd
+
import random
@@ -34,6 +37,107 @@ def randomly_sample_and_filter_participants(
return height_filtered_participants
+def remove_anomalies(data: list, maximum_value: float, minimum_value: float) -> list:
+ """Remove anomalies from a list of numbers"""
+
+ result = []
+
+ for value in data:
+ if minimum_value <= value <= maximum_value:
+ result.append(value)
+
+ return result
+
+
+def calculate_frequency(data: list) -> dict:
+ """Calculate the frequency of each element in a list"""
+
+ frequencies = {}
+
+ # Iterate over each value in the list
+ for value in data:
+ # If the value is already in the dictionary, increment the count
+ if value in frequencies:
+ frequencies[value] += 1
+ # Otherwise, add the value to the dictionary with a count of 1
+ else:
+ frequencies[value] = 1
+
+ return frequencies
+
+
+def calculate_cumulative_sum(array: np.ndarray) -> np.ndarray:
+ """Calculate the cumulative sum of a numpy array"""
+
+ # don't use the built-in numpy function
+ result = np.zeros(array.shape)
+ result[0] = array[0]
+ for i in range(1, len(array)):
+ result[i] = result[i - 1] + array[i]
+
+ return result
+
+
+def calculate_player_total_scores(participants: dict):
+ """Calculate the total score of each player in a dictionary.
+
+ Example input:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3])
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6])
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9])
+ },
+ }
+
+ Example output:
+ {
+ "Alice": {
+ "scores": np.array([1, 2, 3]),
+ "total_score": 6
+ },
+ "Bob": {
+ "scores": np.array([4, 5, 6]),
+ "total_score": 15
+ },
+ "Charlie": {
+ "scores": np.array([7, 8, 9]),
+ "total_score": 24
+ },
+ }
+ """
+
+ for player in participants:
+ participants[player]["total_score"] = np.sum(participants[player]["scores"])
+
+ return participants
+
+
+def calculate_player_average_scores(df: pd.DataFrame) -> pd.DataFrame:
+ """Calculate the average score of each player in a pandas DataFrame.
+
+ Example input:
+ | | player | score_1 | score_2 |
+ |---|---------|---------|---------|
+ | 0 | Alice | 1 | 2 |
+ | 1 | Bob | 3 | 4 |
+
+ Example output:
+ | | player | score_1 | score_2 | average_score |
+ |---|---------|---------|---------|---------------|
+ | 0 | Alice | 1 | 2 | 1.5 |
+ | 1 | Bob | 3 | 4 | 3.5 |
+ """
+
+ df["average_score"] = df[["score_1", "score_2"]].mean(axis=1)
+
+ return df
+
+
def very_complex_processing(data: list):
# Do some very complex processing
diff --git a/learners/files/09-testing-output-files/statistics/test_stats.py b/learners/files/09-testing-output-files/statistics/test_stats.py
index 5c7ab195..56e8ba05 100644
--- a/learners/files/09-testing-output-files/statistics/test_stats.py
+++ b/learners/files/09-testing-output-files/statistics/test_stats.py
@@ -1,3 +1,5 @@
+import numpy as np
+import pandas as pd
import pytest
from stats import (
@@ -5,6 +7,11 @@
filter_participants_by_age,
filter_participants_by_height,
randomly_sample_and_filter_participants,
+ remove_anomalies,
+ calculate_frequency,
+ calculate_cumulative_sum,
+ calculate_player_total_scores,
+ calculate_player_average_scores,
very_complex_processing,
)
@@ -64,6 +71,54 @@ def test_randomly_sample_and_filter_participants(participants):
expected = [{"age": 38, "height": 165}, {"age": 30, "height": 170}, {"age": 35, "height": 160}]
assert filtered_participants == expected
+
+def test_remove_anomalies():
+ """Test remove_anomalies function"""
+ data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+ maximum_value = 5
+ minimum_value = 2
+ expected_result = [2, 3, 4, 5]
+ assert remove_anomalies(data, maximum_value, minimum_value) == expected_result
+
+
+def test_calculate_frequency():
+ """Test calculate_frequency function"""
+ data = [1, 2, 3, 1, 2, 1, 1, 3, 3, 3]
+ expected_result = {1: 4, 2: 2, 3: 4}
+ assert calculate_frequency(data) == expected_result
+
+
+def test_calculate_cumulative_sum():
+ """Test calculate_cumulative_sum function"""
+ array = np.array([1, 2, 3, 4, 5])
+ expected_result = np.array([1, 3, 6, 10, 15])
+ np.testing.assert_array_equal(calculate_cumulative_sum(array), expected_result)
+
+
+def test_calculate_player_total_scores():
+ """Test calculate_player_total_scores function"""
+ participants = {
+ "Alice": {"scores": np.array([1, 2, 3])},
+ "Bob": {"scores": np.array([4, 5, 6])},
+ "Charlie": {"scores": np.array([7, 8, 9])},
+ }
+ expected_result = {
+ "Alice": {"scores": np.array([1, 2, 3]), "total_score": 6},
+ "Bob": {"scores": np.array([4, 5, 6]), "total_score": 15},
+ "Charlie": {"scores": np.array([7, 8, 9]), "total_score": 24},
+ }
+ np.testing.assert_equal(calculate_player_total_scores(participants), expected_result)
+
+
+def test_calculate_player_average_scores():
+ """Test calculate_player_average_scores function"""
+ df = pd.DataFrame({"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4]})
+ expected_result = pd.DataFrame(
+ {"player": ["Alice", "Bob"], "score_1": [1, 3], "score_2": [2, 4], "average_score": [1.5, 3.5]}
+ )
+ pd.testing.assert_frame_equal(calculate_player_average_scores(df), expected_result)
+
+
def test_very_complex_processing(regtest):
data = [1, 2, 3]
diff --git a/learners/files/09-testing-output-files/test_data_structures.py b/learners/files/09-testing-output-files/test_data_structures.py
new file mode 100644
index 00000000..00f3cd2d
--- /dev/null
+++ b/learners/files/09-testing-output-files/test_data_structures.py
@@ -0,0 +1,123 @@
+import numpy as np
+import pandas as pd
+
+
+def test_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert list1 == list2
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert list3 != list4
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert list5 != list6
+
+
+def test_sorted_lists_equal():
+ """Test that lists are equal"""
+ # Create two lists
+ list1 = [1, 2, 3]
+ list2 = [1, 2, 3]
+ # Check that the lists are equal
+ assert sorted(list1) == sorted(list2)
+
+ # Two lists, different order
+ list3 = [1, 2, 3]
+ list4 = [3, 2, 1]
+ assert sorted(list3) == sorted(list4)
+
+ # Create two different lists
+ list5 = [1, 2, 3]
+ list6 = [1, 2, 4]
+ # Check that the lists are not equal
+ assert sorted(list5) != sorted(list6)
+
+
+def test_dictionaries_equal():
+ """Test that dictionaries are equal"""
+ # Create two dictionaries
+ dict1 = {"a": 1, "b": 2, "c": 3}
+ dict2 = {"a": 1, "b": 2, "c": 3}
+ # Check that the dictionaries are equal
+ assert dict1 == dict2
+
+ # Create two dictionaries, different order
+ dict3 = {"a": 1, "b": 2, "c": 3}
+ dict4 = {"c": 3, "b": 2, "a": 1}
+ assert dict3 == dict4
+
+ # Create two different dictionaries
+ dict5 = {"a": 1, "b": 2, "c": 3}
+ dict6 = {"a": 1, "b": 2, "c": 4}
+ # Check that the dictionaries are not equal
+ assert dict5 != dict6
+
+
+def test_numpy_arrays():
+ """Test that numpy arrays are equal"""
+ # Create two numpy arrays
+ array1 = np.array([1, 2, 3])
+ array2 = np.array([1, 2, 3])
+ # Check that the arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_nested_numpy_arrays():
+ """Test that nested numpy arrays are equal"""
+ # Create two nested numpy arrays
+ array1 = np.array([[1, 2], [3, 4]])
+ array2 = np.array([[1, 2], [3, 4]])
+ # Check that the nested arrays are equal
+ np.testing.assert_array_equal(array1, array2)
+
+
+def test_numpy_arrays_with_tolerance():
+ """Test that numpy arrays are equal with tolerance"""
+ # Create two numpy arrays
+ array1 = np.array([1.0, 2.0, 3.0])
+ array2 = np.array([1.00009, 2.0005, 3.0001])
+ # Check that the arrays are equal with tolerance
+ np.testing.assert_allclose(array1, array2, atol=1e-3)
+
+
+def test_dictionaries_with_numpy_arrays():
+ """Test that dictionaries with numpy arrays are equal"""
+ # Create two dictionaries with numpy arrays
+ dict1 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict2 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ # Check that the dictionaries are equal
+ np.testing.assert_equal(dict1, dict2)
+
+ # Create two dictionaries with different numpy arrays
+ dict3 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 6])}
+ dict4 = {"a": np.array([1, 2, 3]), "b": np.array([4, 5, 7])}
+ # Check that the dictionaries are not equal
+ with np.testing.assert_raises(AssertionError):
+ np.testing.assert_equal(dict3, dict4)
+
+
+def test_pandas_dataframes():
+ """Test that pandas DataFrames are equal"""
+ # Create two pandas DataFrames
+ df1 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ df2 = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
+ # Check that the DataFrames are equal
+ pd.testing.assert_frame_equal(df1, df2)
+
+
+def test_pandas_series():
+ """Test that pandas Series are equal"""
+ # Create two pandas Series
+ s1 = pd.Series([1, 2, 3])
+ s2 = pd.Series([1, 2, 3])
+ # Check that the Series are equal
+ pd.testing.assert_series_equal(s1, s2)
diff --git a/learners/files/09-testing-output-files/test_estimate_pi.py b/learners/files/09-testing-output-files/test_estimate_pi.py
deleted file mode 100644
index a40b018d..00000000
--- a/learners/files/09-testing-output-files/test_estimate_pi.py
+++ /dev/null
@@ -1,12 +0,0 @@
-import math
-import random
-
-from estimate_pi import estimate_pi
-
-def test_estimate_pi():
- random.seed(0)
- expected = 3.141592654
- actual = estimate_pi(iterations=10000)
- atol = 1e-2
- rtol = 5e-3
- assert math.isclose(actual, expected, abs_tol=atol, rel_tol=rtol)
diff --git a/learners/files/09-testing-output-files/test_numpy.py b/learners/files/09-testing-output-files/test_numpy.py
deleted file mode 100644
index 0eab737a..00000000
--- a/learners/files/09-testing-output-files/test_numpy.py
+++ /dev/null
@@ -1,27 +0,0 @@
-import numpy as np
-
-def test_numpy_arrays():
- """Test that numpy arrays are equal"""
- # Create two numpy arrays
- array1 = np.array([1, 2, 3])
- array2 = np.array([1, 2, 3])
- # Check that the arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_2d_numpy_arrays():
- """Test that 2d numpy arrays are equal"""
- # Create two 2d numpy arrays
- array1 = np.array([[1, 2], [3, 4]])
- array2 = np.array([[1, 2], [3, 4]])
- # Check that the nested arrays are equal
- np.testing.assert_array_equal(array1, array2)
-
-
-def test_numpy_arrays_with_tolerance():
- """Test that numpy arrays are equal with tolerance"""
- # Create two numpy arrays
- array1 = np.array([1.0, 2.0, 3.0])
- array2 = np.array([1.00009, 2.0005, 3.0001])
- # Check that the arrays are equal with tolerance
- np.testing.assert_allclose(array1, array2, atol=1e-3)
diff --git a/learners/files/10-CI/tests.yaml b/learners/files/10-CI/tests.yaml
deleted file mode 100644
index 8de39b0a..00000000
--- a/learners/files/10-CI/tests.yaml
+++ /dev/null
@@ -1,72 +0,0 @@
-# This is just the name of the action, you can call it whatever you like.
-name: Tests (pytest)
-
-# This sets the events that trigger the action. In this case, we are telling
-# GitHub to run the tests whenever a push is made to the repository.
-# The trailing colon is intentional!
-on:
- push:
- # Only check when Python files are changed.
- # Don't need to check when the README is updated!
- paths:
- - '**.py'
- - 'pyproject.toml'
- pull_request:
- paths:
- - '**.py'
- - 'pyproject.toml'
- # Only check when somebody raises a pull_request to main.
- branches: [main]
- # This allows you to run the tests manually if you choose
- workflow_dispatch:
-
-
-# This is a list of jobs that the action will run. In this case, we have only
-# one job called build.
-jobs:
-
- build:
-
- strategy:
- matrix:
- python_version: ["3.12", "3.13", "3.14"]
- os: ["ubuntu-latest", "windows-latest"]
- exclude:
- - os: "windows-latest"
- python_version: "3.12"
- - os: "windows-latest"
- python_version: "3.13"
-
- # This is the environment that the job will run on.
- runs-on: ${{ matrix.os }}
-
- # This is a list of steps that the job will run. Each step is a command
- # that will be executed on the environment.
- steps:
-
- # This command tells GitHub to use a pre-built action. In this case, we
- # are using the actions/checkout action to check out the repository. This
- # just means that GitHub will clone this repository to the current
- # working directory.
- - uses: actions/checkout@v6
-
- # This is the name of the step. This is just a label that will be
- # displayed in the GitHub UI.
- - name: Set up Python ${{ matrix.python_version }}
- # This command tells GitHub to use a pre-built action. In this case, we
- # are using the actions/setup-python action to set up Python 3.12.
- uses: actions/setup-python@v6
- with:
- python-version: ${{ matrix.python_version }}
-
- # This step installs the dependencies for the project such as pytest,
- # numpy, pandas, etc using the requirements.txt file we created earlier.
- - name: Install dependencies
- run: |
- python -m pip install --upgrade pip
- pip install -r requirements.txt
-
- # This step runs the tests using the pytest command.
- - name: Run tests
- run: |
- pytest
diff --git a/learners/setup.md b/learners/setup.md
index e8f9cec2..03624df2 100644
--- a/learners/setup.md
+++ b/learners/setup.md
@@ -2,73 +2,49 @@
title: Setup
---
-## Testing and Continuous Integration
+## Python testing for research
-This course aims to equip you with the tools and knowledge required to get
-started with software testing. It assumes no prior knowledge of testing, just
-basic familiarity with Python programming. Over the course of these lessons,
-you will learn what software testing entails, how to write tests, best
-practices, some more niche & powerful functionality and finally how to
-incorporate tests in a GitHub repository.
+This course aims to equip you with the tools and knowledge required to get started with software testing. It assumes no prior knowledge of testing, just basic familiarity with Python programming. Over the course of these lessons, you will learn what software testing entails, how to write tests, best practices, some more niche & powerful functionality and finally how to incorporate tests in a GitHub repository.
## Software Setup
-Please complete these setup instructions before the course starts. This is to
-ensure that the course can start on time and all of the content can be covered.
-If you have any issues with the setup instructions, please reach out to a
-course instructor / coordinator.
+Please complete these setup instructions before the course starts. This is to ensure that the course can start on time and all of the content can be covered. If you have any issues with the setup instructions, please reach out to a course instructor / coordinator.
For this course, you will need:
-### A Text Editor
-Preferably a code editor like Visual Studio Code but any text editor will do,
-such as notepad. This is so that you can write and edit Python scripts. A code
-editor will provide a better experience for writing code in this course. We
-recommend Visual Studio Code as it is free and very popular with minimal setup
-required.
-
### A Terminal
-Such as Terminal on MacOS / Linux or command prompt on Windows. This is so that
-you can run Python scripts and commit code to GitHub. Note that Visual Studio
-Code provides both a terminal and Git integration.
+Such as Terminal on MacOS / Linux or command prompt on Windows. This is so that you can run Python scripts and commit code to GitHub.
+### A Text Editor
+Preferably a code editor like Visual Studio Code but any text editor will do, such as notepad. This is so that you can write and edit Python scripts. A code editor will provide a better experience for writing code in this course. We recommend Visual Studio Code as it is free and very popular with minimal setup required.
### Python
-Preferably Python 3.12 or higher. You can download Python from [Python's
-official website](https://www.python.org/downloads/).
+Preferably Python 3.10 or 3.11. You can download Python from [Python's official website](https://www.python.org/downloads/)
-It is recommended that you use a virtual environment for this course. This can
-be a standard Python virtual environment or a conda environment. You can create
-a virtual environment using the following commands:
+It is recommended that you use a virtual environment for this course. This can be a standard Python virtual environment or a conda environment. You can create a virtual environment using the following commands:
```bash
# For a standard Python virtual environment
-python -m venv venv
-# Linux
-source venv/bin/activate
-# Windows (powershell)
-.\venv\Scripts\Activate.ps1
+python -m venv myenv
+source myenv/bin/activate
# For a conda environment
conda create --name myenv
conda activate myenv
```
-There are some python packages that will be needed in this course, you can
-install them using the following command:
+There are some python packages that will be needed in this course, you can install them using the following command:
```bash
-pip install numpy pytest snaptol
+pip install numpy pandas matplotlib pytest snaptol
```
### Git
-This course touches on some features of GitHub and requires Git to be installed. You
-may find it helpful to view the material from our course [Introduction to Git
-and GitHub](https://researchcodingclub.github.io/course/#version-control-introduction-to-git-and-github).
+This course touches on some features of GitHub and requires Git to be installed. You can download Git from the [official Git website](https://git-scm.com/downloads). If this is your first time using Git, you may want to check out the [Git Handbook](https://guides.github.com/introduction/git-handbook/).
### A GitHub account
A GitHub accound is required for the Continuous Integration section of this course.
-You can sign up for a GitHub account on the [GitHub Website](https://github.com/)
+You can sign up for a GitHub account on the [GitHub Website](github.com)