Skip to content

[Add Concept] Randomness draft#490

Open
colinleach wants to merge 13 commits intoexercism:mainfrom
colinleach:randomness
Open

[Add Concept] Randomness draft#490
colinleach wants to merge 13 commits intoexercism:mainfrom
colinleach:randomness

Conversation

@colinleach
Copy link
Copy Markdown
Contributor

Incomplete draft at this stage, but it's a big one: in document length, and in centrality to what R is all about.

The Introduction is still blank. Let's get the About to something we're happy with, then we can do a cut-down Introduction for captains-log.

The main question in my mind is what to do about the graphs: normal, binomial, Poisson distributions in action.

  • I've provided simple code samples based on hist(), and links to a couple of online R playgrounds that can give graphical output. I quite like having students run code interactively: it sorts the serious ones from the sort who whine about having to read instructions before they can complete Lasagna (Bethany's recent bugbear!)
  • Should we include plots within the about.md? I usually avoid this, but now may be the moment. At least the plots are monochrome, so invertible for dark backgrounds. I have no idea what alt-text would look like.
  • If only for my own interest, I could produce a fancy supplementary PDF using ggplot2 and Quarto. Interested students could follow a link and download it from the GH repo.

None of this is great for our visually-impaired students (and mentors).

Copy link
Copy Markdown
Contributor

@depial depial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good so far! I do think it might be nice to be able to include the graphs in the about.md, but I understand the hesitation. Also, if you want to create some supplementary material to link to, I see no harm in that.

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 15, 2026

I'll push the small changes fairly soon.

At some point (it doesn't need to be today!), I'd like @BethanyG's view on the graphics. We went through some of the accessibility issues during a Python approaches PR, and she knows orders of magnitude more than me about this.

My latest rabbit hole: learning Quarto to try and create some external content on distributions. It was already on my TODO list for 2026, so this seems a good moment.

It can natively mix Python, R, Julia, and some sort of JS slides package within the same document. Output is via pandoc, so pretty much anything: HTML, PDF (via $\LaTeX$), Powerpoint, Word...

Quarto is from Posit. Of course it's from Posit!

@depial
Copy link
Copy Markdown
Contributor

depial commented Mar 15, 2026

My latest rabbit hole: learning Quarto to try and create some external content on distributions. It was already on my TODO list for 2026, so this seems a good moment.

Thanks for this! I'll look into it myself. I was just about to go back to Overleaf again, but this might be the right tool for the job!

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 15, 2026

If you need LaTeX, you need Overleaf. If pandoc markdown is enough (with a slightly richer syntex than GH markown), with bits of embedded LaTeX, plus the ability to execute code blocks during rendering, then Quarto seems worth learning.

Not to mention Shiny (R or Python, sadly not Julia) if you need interactive websites to present your data. Also from ...cough... Posit.

@colinleach
Copy link
Copy Markdown
Contributor Author

As well as the web links to documentation, Posit have their own very active YouTube channel for software tutorials.

@depial
Copy link
Copy Markdown
Contributor

depial commented Mar 16, 2026

If you'd like, we could add just a little extra detail to the description of the prefixes. Something like:


The names of R's distribution functions have a prefix letter followed by an abbreviated distribution name.

The prefix letters are:

  • d : Density - the value of the PDF/PMF at a point
  • p : Distribution function - the CDF from a point
  • q : Quantile function - the inverse CDF
  • r : Random generation - random samples from the distribution

We will focus mainly on the r*() functions, and leave the others to students with a statistical background.


If you think that's a good idea, you might want to see if my explanations are sufficiently accurate. I may be fudging things a bit.

We will focus mainly on the r*() functions, and leave the others to students with a statistical background.

Also, I thought I'd point out that immediately after this interlude, dbinom is used in some examples.

@colinleach
Copy link
Copy Markdown
Contributor Author

Thanks for that. I'll have a play with the various functions and try to understand them better. I can see most of that ending up in a supplementary document, not the Concepts docs.

dbinom is used in some examples

Yes, I'd started (tentatively!) wondering about the best way to mention that.

Also, all the code examples still need formatting. It's easiest to do that in one step at the end (with a couple of regexes in VSCode).

I'm not in a rush to get this merged. Drafting a few more concepts may be my next priority - I'll create an issue with updated plans.

@colinleach colinleach mentioned this pull request Mar 16, 2026
7 tasks
@colinleach
Copy link
Copy Markdown
Contributor Author

I expanded the box on distribution functions, and added various references.

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 19, 2026

I've cleaned up the code formatting, so this might now be ready for review?

Except that we still need a policy on graphics, versus code samples for students to generate their own, versus an external document (e.g. something Quarto-generated).

@colinleach colinleach marked this pull request as ready for review March 19, 2026 17:31
@colinleach colinleach requested a review from BethanyG March 19, 2026 17:31
@depial
Copy link
Copy Markdown
Contributor

depial commented Mar 19, 2026

My initial vote would be for graphics in the about.md, and code snipets in the introduction.md. Also, I'm perfectly fine with an external document, but should be hosted in this repo? I feel like it should be tied to either the concept or the exercise in some way.

@colinleach
Copy link
Copy Markdown
Contributor Author

I've added an Introduction. Pretty much a placeholder, as it will probably be replaced by whatever we use for the captains-log exercise.

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 19, 2026

My initial vote would be for graphics in the about.md, and code snippets in the introduction.md

@BethanyG ? This is something for a Guardian to comment on, not just in your role as a track maintainer. I can't think of another concept document on any track that includes graphics, but I don't mind being the first.

We're the first authors to discuss linear algebra, the first to discuss probability density functions, and the first to cite Knuth. Exercism hasn't collapsed into chaos as a result!

@colinleach
Copy link
Copy Markdown
Contributor Author

It may not be long-term useful, but I've been playing with making graphs of various distributions. It's incomplete, ugly and generally a mess at this stage, but at least the various functions are doing what I expected.

  • PDF document.
  • HTML, though I'm having problems deploying to Quarto Pub. It will open in a browser if you have it cloned locally.

@depial
Copy link
Copy Markdown
Contributor

depial commented Mar 20, 2026

Just a thought... Since the status of the syllabus is still wip, we could merge this with the graphs in about.md to get a feel for how it will look on the site. It'll be simple enough to take them out if it's a no-go by re-introducing the code samples from the introduction.md.

As for vision impaired users, I believe there should be a way to include a description which is accessible to them but doesn't explicitly appear in the document. At least I've recently noticed this on Overleaf when adding figures, where there is a caption, which appears below the figure, and an optional description that can be significantly longer, which I assume will appear when the mouse is hovered over the image or some other accessibility functionality applied. An example from Overleaf:

\begin{figure}[h]
    \centering
    \includegraphics[width=0.5\linewidth]{event-higgs.png}
    \caption{Feynman diagram of the Higgs event: $gg \rightarrow H^o$}
    \label{fig:placeholder}
    \Description{Feynman diagram of the Higgs event of interest in this project, showing decay of a Higgs particle into two W bosons and a bottom quark/antiquark pair.}
\end{figure}

This Description could contain either a visual description of the graph, the code snipets to produce it, or both.

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 20, 2026

I'll make a few graphs with bigger text, and monochrome so that it is invertible for dark mode. That just requires learning more about ggplot2, which I haven't used much (until yesterday).

Meanwhile, Quarto Pub is still being obstructive. I'll give up on that and try deploying to GitHub Pages.

Also, I'm not sure about including this in introduction.md, unless we significantly revise the exercise. I don't think Captain's Log covers anything except uniform distributions?

@depial
Copy link
Copy Markdown
Contributor

depial commented Mar 20, 2026

Also, I'm not sure about including this in introduction.md, unless we significantly revise the exercise.

We don't have to include it if it doesn't fit. I just thought it would be a good way to keep the back up code for about.md if we need to revert, but I'm sure that'll be accessible somewhere.

@BethanyG
Copy link
Copy Markdown
Member

@BethanyG ? This is something for a Guardian to comment on, not just in your role as a track maintainer. I can't think of another concept document on any track that includes graphics, but I don't mind being the first.

Python's simple linked list has graphics in the append, but same difference.

You need to be careful here - the graphics have to be uploaded to a different repo, and that means that until they are merged there, you can't actually use them in the doc and have them show up. So - its a process. Check the discussion around the graphics Glenn used for prism for the details...I've forgotten the precise steps.

I'd also be careful with the invertible file. The lines need to be big enough to track on both themes, and sometimes the inversion is wonky. You also want a transparent background, or the white default will be really ugly on the dark theme. But I will let you experiment with those things.

As for vision impaired users, I believe there should be a way to include a description which is accessible to them but doesn't explicitly appear in the document.

Yup. It's called alt-text when it's HTML, but if you do it in markdown, it has a weirder syntax. Here is an example from the simple-linked-list in Python. Note the brackets after the !, which is where the alt-text goes:

![Diagram representing a stack implemented with a linked list. A circle with a dashed border named New_Node is to the far left-hand side, with two dotted arrow lines pointing right-ward.  New_Node reads "(becomes head) - New_Node - next = node_6". The top dotted arrow line is labeled "push" and points to Node_6, above and to the right.  Node_6 reads "(current) head - Node_6 - next = node_5". The bottom dotted arrow line is labeled "pop" and points to a box that reads "gets removed on pop()". Node_6 has a solid arrow that points rightward to Node_5, which reads "Node_5 - next = node_4". Node_5 has a solid arrow pointing rightward to Node_4, which reads "Node_4 - next = node_3". This pattern continues until Node_1, which reads "(current) tail - Node_1 - next = None". Node_1 has a dotted arrow pointing rightward to a node that says "None".](https://assets.exercism.org/images/tracks/python/simple-linked-list/linked-list.svg)

@BethanyG
Copy link
Copy Markdown
Member

We don't have to include it if it doesn't fit. I just thought it would be a good way to keep the back up code for about.md if we need to revert, but I'm sure that'll be accessible somewhere.

I like the idea of including code. Although we need to be careful to not make things too verbose for those using screen readers. So we may want to check what the "rules" might be for accessibility when it comes to graphs and code.

WAI Guidelines for Complex Images

@BethanyG
Copy link
Copy Markdown
Member

Quarto is from Posit. Of course it's from Posit!

Discussion for another place and day. Posit is currently on my sh*! list. 🙂

@BethanyG
Copy link
Copy Markdown
Member

BethanyG commented Mar 20, 2026

At some point (it doesn't need to be today!), I'd like @BethanyG's view on the graphics. We went through some of the accessibility issues during a Python approaches PR, and she knows orders of magnitude more than me about this.

Apologies. I missed this in the long discussion. I can take a look at these maybe later today. Right now, I want to finish the template I am working on, and I also need to reply to some discussions and do some reviews on the Python repo. But I will try to get to them later this afternoon.

@colinleach
Copy link
Copy Markdown
Contributor Author

I'd like @BethanyG's view on the graphics

You kind of already answered this in your posts over the las hour.

@BethanyG
Copy link
Copy Markdown
Member

BethanyG commented Mar 20, 2026

You kind of already answered this in your posts over the las hour.

OK. But I can take another look. Also - if you just need generic distro info, I did some graphics in SVG for a WomenWhoCode thing a few years back. May or may not be helpful. I'll dig em out.

Huh. Couldn't find the SVG, but here is a PNG for Gauss, one for Uniform, and one for Poisson. Yes, the purple and green are quite unfortunate. Blame WWC.

I have a plugin to my graphics program that renders LaTeX as vector graphics, hence the nice equations.

@BethanyG
Copy link
Copy Markdown
Member

BethanyG commented Mar 20, 2026

One last comment on graphics. I didn't want to pick too much at Glenn (so didn't go on about it in the forum thread), but I'd also keep an eye to what the site uses for colors, and see if you can use any of those (or hue variants of them) in your graphics. With a site as strongly branded as Exercism, using "standard" colors can clash.....

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 20, 2026

I'll make a few graphs with bigger text

This isn't a 10-minute job, even before we get to all the Exercism/CDN complications. No need for you two to drop everything and try to deal with this urgently!

using "standard" colors can clash.

I'm hoping to avoid colors in any plots for the Exercism website.

Also, I notice that ggplot2 was written by a Kiwi, so accepts "color" or "colour" interchangeably. I approve!

@colinleach
Copy link
Copy Markdown
Contributor Author

Slightly off the current priorities, but I now have stuff live on GH Pages.

HTML displays by default, but Under "Other Formats" at top-right the PDF link should work.

It went live less than 10 minutes ago, so there are still plenty of errors and inelegancies to fix.

All computations are run locally on my PC and stored in a _freeze/ directory, so we should be able to handle any R, Python and Julia stuff without worrying about what GH can compute. Sometime (not today!) I'll play with NumPy, Matplotlib and Plots.jl to see what they look like.

The source repo is here. It's currently public, but only I can write to it. I assume it's pretty easy to give write access to @BethanyG and @depial, if desired.

@colinleach
Copy link
Copy Markdown
Contributor Author

I've generated a couple of sample plots (light/dark versions) to get a feel for it.

There are also PowerPoint and PDF files, to show how they display on different backgrounds.

Maybe not perfect, but is this moving in the right direction?

P.S. There was a lot of blundering around to get to this point.

@BethanyG
Copy link
Copy Markdown
Member

So at first glance, the dark SVG is not great. The grey background with the black lines is not going to show up on a dark background very well at all. Here is a screengrab I made quickly. The containing background color is sampled directly from the website's dark theme:

image

Looks like the x/y info isn't showing up, and the gridlines are not really readable.

Recommend that the grey in the background be transparent (as in removed/no color), the grid lines in the background be white, and the graphed lines be either the background grey or white. I think the text should also be bigger, since it is getting lost between the title bar and the other elements. And the title bar should either be grey or white (with black text).

The light one should be (sort of) the inverse: no grey background, grid lines in black (or grey), but I think you can keep the graphed lines black there.

Give me a few, and I can get you examples. But basically, the square grey backgrounds are not reading well, and don't produce good contrast for either light or dark.

Another thought here is that what is being discussed is the shape of the curve, given the SD. So specific gridlines and values aren't technically needed for the understanding: lower SD == steeper curve, and higher SD == flatter or "fatter" curve. So axis values could be omitted.

@colinleach
Copy link
Copy Markdown
Contributor Author

Latest variants remove the gray background and make it transparent. Bigger fonts, bolded facet titles. I'm not sure white-on-gray is ideal for the light facet title, so that's something else to play with.

@BethanyG
Copy link
Copy Markdown
Member

Here is my rough edit. Haven't seen the changes yet, but thought I'd add this here for an example:

sample_random_normal

@BethanyG
Copy link
Copy Markdown
Member

BethanyG commented Mar 21, 2026

Latest variants remove the gray background and make it transparent. Bigger fonts, bolded facet titles. I'm not sure white-on-gray is ideal for the light facet title, so that's something else to play with.

Agree that the white with grey is not ... good. So white against black or black against either grey or white is probably better. Also noticed that the grid lines are showing as black....I think those might be better as white.

Edited to add: We probably don't even need the grid lines if we keep the X/Y values and just have a border tick for those.

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 21, 2026

noticed that the grid lines are showing as black

Where? I see them as light gray in both schemes.

We probably don't even need the grid lines

Might be simplest. I can't say I'm having fun with this...

@BethanyG
Copy link
Copy Markdown
Member

Might be simplest. I can't say I'm having fun with this...

Sorry to hear that. I know these things can be fussy. But I do think they are important, and that they read better now than they did initially. And the more I think about it, the more the background grid could read as visual noise.

But if it is a struggle to remove the grid, I'd make it white for dark, and black for light. High contrast is best, I think.

@colinleach
Copy link
Copy Markdown
Contributor Author

colinleach commented Mar 21, 2026

I do think they are important

Not disagreeing, which is why I'm doing this. It just reminds me why I chose a career in science, not graphic design.

I've changed the facet strip for contrast. I'll experiment with removing gridlines.

What I learned along the way: ggplot2 is at least as configurable as Matplotlib, but with no overlap in how they do things.

@BethanyG
Copy link
Copy Markdown
Member

BethanyG commented Mar 21, 2026

What I learned along the way: ggplot2 is at least as configurable as Matplotlib, but with no overlap in how they do things.

This likely will not make you feel better - I think in the beginning Matplotlib either wrapped ggplot2 or replicated its functionality, but then disagreed with the order of how things were done. Hence the configuration differences and ... weirdness. And it was also programmed by a mathematician, so things that were (ahem!) perfectly comprehensible to him were not exactly comprehensible to the rest of us.

NOPE - I was wrong. And it is way worse: The author was a power user of MATLAB graphics. And he was from an era when OOP and Java reigned supreme. That explains ... some things. 😂

@colinleach
Copy link
Copy Markdown
Contributor Author

Removing gridlines worked OK for the light theme. For some reason, the dark theme also lost axes and tick marks.

I don't think this will get fixed in the next few minutes. As I'm out of milk and a few other things, it might be more useful to go to the supermarket at this point.

Thanks for your help. Once we nail this first one, I'm sure it will get easier in future.

@colinleach
Copy link
Copy Markdown
Contributor Author

No more on this for tonight, except to note that I must try plotnine sometime. It claims to be a Python version of ggplot2 and (after some earlier wobbles) it seems to be actively maintained.

@depial
Copy link
Copy Markdown
Contributor

depial commented Mar 23, 2026

Sorry for not chiming in earlier! I'm a bit distracted otherwise (and probably will be for the better part of the next four weeks).

I'm not big on data visualization, but I've had to take a few courses in it in the past. The data science community seems to take it as pretty much a discipline in itself (and I tend to agree). I'm not sure how far you want to dive in, but, when learning R, I remember The Grammar of Graphics was heavily emphasized (thorough summary).

I would say the main take away for graphs of the type you're working on here would be to be somewhat conservative on the amount of information displayed. For example:

  • You could probably lose the graph lines in most of the plot since they busy things up without adding a ton of relevant information.
  • Axis numbering could be kept or dropped based on what you're trying to portray (e.g. you might want to mark at least y=1 and x=0 when displaying the distributions).
  • Your titles are nice, and axis labels are almost always a good idea.
  • Also, be careful with axes that have been "cropped". Like a probability distributions which have axes that go up to 15% vs another which goes up to 80%, since this can be misleading.

If you want, have a look at the grammar of graphics ideas, and maybe some of the major pitfalls to avoid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants