Hi Elisa — Yes, I mentioned XPath, and I am not surprised it whetted your appetite even at 01:00 in the freakin’ morning! 🙂

However, it’s not the code[0] that’s the problem, it’s the data. If you ask GitHub for all the PRs,[1] convert to XML,[2] grab all the PR numbers,[3] and ask for the reviewers for each of those,[4] and convert them to XML,[2] you now have a large pile of ugly XML data.[5] I have not figured out a consistent way to extract reviewer user names (which are encoded as <string key="login"> elements) from that pile. (Note that reviewers can appear both in the main set of PR info and the separate little array of review info.)

Just to save anyone interested the trouble, I have put the pile of data I got yesterday in [6].

P.S. Elisa: what are you teaching this semester?

Notes

[0] Which is XSLT, not Python. I have not written Python in over a decade. And I was bad at it.

[1] https://api.github.com/repos/TEIC/TEI/pulls plus https://api.github.com/repos/TEIC/Stylesheets/pulls

[2] Presuming here with just json-to-xml(), but actually I have taken an extra step after that to give me slightly more XML-like data. (And then another step to give me real XML data, but a) that is not part of this discussion, and b) it is not finished.)

[3] …/fn:array/fn:map/fn:number[@key eq 'number']; and be careful, because you do NOT want the …/fn:array/fn:map/fn:map/fn:number[@key eq 'number'] values.

[4] https://api.github.com/repos/TEIC/(TEI|Stylesheets)/pulls/$N/requested_reviewers.

[5] You want to save this data on disk, rather than ask for it again every time you want to play, because GitHub limits you to 60 requests per hour unless you pay them.

[6] http://bauman.zapto.org/~syd/temp/4TEICouncil/GitHub_PR_data_2021-01-14.tgz

You mentioned XPath? I'd be curious to see the rest of the code. I've just been writing some Python myself to address the Box.com API to rescue some old project metadata from a big set of nested file directories and I'm half shocked that I got my code to work. I do wonder whether Python might be easier to address the GitHub API. Fair warning: classes start up again for me on 1/19, but I'd be happy to take a look anyway! ;-)