Mendel's Genetics by danielle-pinto · Pull Request #16 · BioJulia/BioTutorials

danielle-pinto · 2026-01-30T21:27:44Z

Making a draft PR here. There's multiple ways to solve the problem, and I added a first approach. I'm thinking that the second would be a more statistical/simulation approach. Basically, based on the values of k, m, n, we can make a vector containing all of the possible organisms (eg. [HH, Hh, hh, HH, etc.]). Then, we can calculate the percentage of dominant individuals/total individuals.

Wanted to run this by you first and see if you had any suggestions on packages to use.

github-actions · 2026-01-30T21:28:07Z

Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR16/

kescobo · 2026-02-02T15:35:26Z

Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR16/

Just noting that the comment is being made, but the link doesn't actually work.

Probably unrelated to the above, your pull request is for some reason requesting to merge into another branch, rather than into main

kescobo

Another solution would be to use StatsBase.jl and do a weighted probability.

One other thing that would be nice to include here is a bit more didactic discussion about how often times we make algorithms that are narrowly tailored, but then we either repeat ourselves or get more complicated as additional requirements get tacked on. Eg, for this problem, your solution works for the specific problem, but we'd have to derive a new equation if the question is something like "What's the probability of a heterozygous offspring?" It also doesn't scale up if we add another trait etc.

Nice thing about the StatsBase.jl solution and even a simulation is that they can be made generic and then can be used to ask more types of questions. I'm not necessarily demanding we add this to a first draft, but maybe open an issue as a potential enhancement.

docs/src/rosalind/07-iprb.md

kescobo · 2026-02-02T15:51:55Z

I like the idea of a simulation, though it will generally not give a precisely correct answer for rosalind. I think that's fine if that's explained.

danielle-pinto · 2026-02-03T18:41:18Z

Probably unrelated to the above, your pull request is for some reason requesting to merge into another branch, rather than into main

I did this just so it wasn't showing changes for the Hamming Distance problem as well. I branched off of the hamming distance branch, but in hindsight, should have branched off main. Will keep in mind for the future.

docs/src/rosalind/07-iprb.md

danielle-pinto · 2026-02-06T17:15:55Z

@kescobo Ready for a final review! I think you've reviewed most of the first part (algorithm piece), so the main thing to focus on here is the statistical/sampling method.

docs/src/rosalind/07-iprb.md

kescobo · 2026-02-10T17:38:25Z

docs/src/rosalind/07-iprb.md

+
+For instance, we can use a simulation that can broadly calculate the likelihood of a given offspring based on a set of given probabilities.
+
+This solution is generic and can be used to ask more types of questions. 


The generic solution I was thinking was actually not to simulate, but rather to be generic with the exact statistics. I like the simulation too, but eg outputting the probability matrix you generated would then allow you to count other outputs

Ah, maybe I can make this function more general by having the probability matrix as an input as well. Is that what you meant here?

Sort of. If you're strictly in mendelian land, you can think of things in terms of allele frequencies and multiplication of probabilities. I also wonder if it would be worth introducing something about julia types here... but we can save that for later

kescobo · 2026-02-10T17:38:33Z

docs/src/rosalind/07-iprb.md

+
+function mendel_sim(k, m, n; iterations=100000)
+    # Genotypes: 1=HH, 2=Hh, 3=hh
+    population = [fill(1, k); fill(2, m); fill(3, n)]


I think using a weight vector here makes more sense - if you have millions, you're gonna allocate a giant array. Instead you can do something like

total_pop = k+m+n wts = [k/total_pop, m/total_pop, n/total_pop] sample([1,2,3], weights(wts), 2) # samples from the vector [1,2,3] with probability weights given by wts

kescobo · 2026-02-10T17:43:45Z

docs/src/rosalind/07-iprb.md

+    dominant_count = sum(
+      offspring_prob[sample(population, 2; replace=false)...]
+      for i in 1:iterations
+  )


This is going to allocate a lot I think. I think the canonical way to do this is something like

sum(1:iterations) do _ (i,j) = sample([1,2,3], weights(wts), 2) return offspring_prob[i,j] end

danielle-pinto · 2026-02-11T02:23:46Z

Made some edits based on your last comments! @kescobo I think we are close to being able to merge in?

danielle-pinto added 2 commits January 30, 2026 15:15

initial commit

0ba1396

rough draft of first solution

7e03286

fix problem name

7e0b97b

kescobo reviewed Feb 2, 2026

View reviewed changes

docs/src/rosalind/07-iprb.md Outdated Show resolved Hide resolved

docs/src/rosalind/07-iprb.md Outdated Show resolved Hide resolved

danielle-pinto added 2 commits February 3, 2026 14:33

adding semantic line breaks

a21c022

add note about downsides of algorithm approach

5872398

kescobo reviewed Feb 4, 2026

View reviewed changes

docs/src/rosalind/07-iprb.md Outdated Show resolved Hide resolved

docs/src/rosalind/07-iprb.md Show resolved Hide resolved

docs/src/rosalind/07-iprb.md Outdated Show resolved Hide resolved

danielle-pinto added 2 commits February 5, 2026 21:39

add statistical approach

c8b908b

implement Kevin's minor changes

04887e4

danielle-pinto marked this pull request as ready for review February 6, 2026 17:14

fix typos

8e1d7e2

Base automatically changed from 2026-01-27-hamming-distance to main February 8, 2026 01:41

kescobo reviewed Feb 10, 2026

View reviewed changes

make edits according to Kevin's suggestions

af56621

danielle-pinto and others added 2 commits February 10, 2026 21:29

add Project.toml back

7a10364

Merge branch 'main' into 2026-01-30-iprb

c126cc5

danielle-pinto requested a review from kescobo February 12, 2026 00:31


		For instance, we can use a simulation that can broadly calculate the likelihood of a given offspring based on a set of given probabilities.

		This solution is generic and can be used to ask more types of questions.

Conversation

danielle-pinto commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 30, 2026

Uh oh!

kescobo commented Feb 2, 2026

Uh oh!

kescobo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kescobo commented Feb 2, 2026

Uh oh!

danielle-pinto commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danielle-pinto commented Feb 6, 2026

Uh oh!

Uh oh!

kescobo Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

danielle-pinto Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

kescobo Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

kescobo Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

kescobo Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

danielle-pinto commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielle-pinto commented Jan 30, 2026 •

edited

Loading