Conversation
|
Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR16/ |
Just noting that the comment is being made, but the link doesn't actually work. Probably unrelated to the above, your pull request is for some reason requesting to merge into another branch, rather than into |
kescobo
left a comment
There was a problem hiding this comment.
Another solution would be to use StatsBase.jl and do a weighted probability.
One other thing that would be nice to include here is a bit more didactic discussion about how often times we make algorithms that are narrowly tailored, but then we either repeat ourselves or get more complicated as additional requirements get tacked on. Eg, for this problem, your solution works for the specific problem, but we'd have to derive a new equation if the question is something like "What's the probability of a heterozygous offspring?" It also doesn't scale up if we add another trait etc.
Nice thing about the StatsBase.jl solution and even a simulation is that they can be made generic and then can be used to ask more types of questions. I'm not necessarily demanding we add this to a first draft, but maybe open an issue as a potential enhancement.
|
I like the idea of a simulation, though it will generally not give a precisely correct answer for rosalind. I think that's fine if that's explained. |
|
@kescobo Ready for a final review! I think you've reviewed most of the first part (algorithm piece), so the main thing to focus on here is the statistical/sampling method. |
|
|
||
| For instance, we can use a simulation that can broadly calculate the likelihood of a given offspring based on a set of given probabilities. | ||
|
|
||
| This solution is generic and can be used to ask more types of questions. |
There was a problem hiding this comment.
The generic solution I was thinking was actually not to simulate, but rather to be generic with the exact statistics. I like the simulation too, but eg outputting the probability matrix you generated would then allow you to count other outputs
There was a problem hiding this comment.
Ah, maybe I can make this function more general by having the probability matrix as an input as well. Is that what you meant here?
There was a problem hiding this comment.
Sort of. If you're strictly in mendelian land, you can think of things in terms of allele frequencies and multiplication of probabilities. I also wonder if it would be worth introducing something about julia types here... but we can save that for later
docs/src/rosalind/07-iprb.md
Outdated
|
|
||
| function mendel_sim(k, m, n; iterations=100000) | ||
| # Genotypes: 1=HH, 2=Hh, 3=hh | ||
| population = [fill(1, k); fill(2, m); fill(3, n)] |
There was a problem hiding this comment.
I think using a weight vector here makes more sense - if you have millions, you're gonna allocate a giant array. Instead you can do something like
total_pop = k+m+n
wts = [k/total_pop, m/total_pop, n/total_pop]
sample([1,2,3], weights(wts), 2) # samples from the vector [1,2,3] with probability weights given by wts
docs/src/rosalind/07-iprb.md
Outdated
| dominant_count = sum( | ||
| offspring_prob[sample(population, 2; replace=false)...] | ||
| for i in 1:iterations | ||
| ) |
There was a problem hiding this comment.
This is going to allocate a lot I think. I think the canonical way to do this is something like
sum(1:iterations) do _
(i,j) = sample([1,2,3], weights(wts), 2)
return offspring_prob[i,j]
end
|
Made some edits based on your last comments! @kescobo I think we are close to being able to merge in? |


Making a draft PR here. There's multiple ways to solve the problem, and I added a first approach. I'm thinking that the second would be a more statistical/simulation approach. Basically, based on the values of k, m, n, we can make a vector containing all of the possible organisms (eg. [HH, Hh, hh, HH, etc.]). Then, we can calculate the percentage of dominant individuals/total individuals.
Wanted to run this by you first and see if you had any suggestions on packages to use.