Inspired by Nick’s tei_current_assignments.pyhttps://github.com/npcole/teiutils, a program that leverages the gitHub API to summarize gitHub issue assignments, I have just spent the entire day trying to write an XSLT program to summarize pull requests. At this point, I have to (at least temporarily) give up. I simply cannot figure out how to get the list of reviewers (those who have been asked to be reviewers, those who have accepted the request to be a reviewer, and those who have submitted reviews) without, at least sometimes, including the OP in that list. (That is, I have yet to figure out an XPath that will select the reviewers without the OP from the JSON-converted-to-XML returned by the “reviews” API call.) If anybody feels like helping me out, I think we would find this a useful utility. To prove the point, I generated one possible useful output (an HTML table) using my current mildly broken code, and went through and fixed the output by hand. (Sigh.) See https://bauman.zapto.org/~syd/temp/4TEICouncil/GHPRs.html. Talk to y’all tomorrow.
Syd,
You mentioned XPath? I'd be curious to see the rest of the code. I've just
been writing some Python myself to address the Box.com API to rescue some
old project metadata from a big set of nested file directories and I'm half
shocked that I got my code to work. I do wonder whether Python might be
easier to address the GitHub API. Fair warning: classes start up again for
me on 1/19, but I'd be happy to take a look anyway! ;-)
Elisa
On Wed, Jan 13, 2021 at 10:13 PM Bauman, Syd
Inspired by Nick’s tei_current_assignments.py https://github.com/npcole/teiutils, a program that leverages the gitHub API to summarize gitHub issue assignments, I have just spent the entire day trying to write an XSLT program to summarize pull requests. At this point, I have to (at least temporarily) give up. I simply cannot figure out how to get the list of reviewers (those who have been asked to be reviewers, those who have accepted the request to be a reviewer, and those who have submitted reviews) without, at least sometimes, including the OP in that list. (That is, I have yet to figure out an XPath that will select the reviewers without the OP from the JSON-converted-to-XML returned by the “reviews” API call.)
If anybody feels like helping me out, I think we would find this a useful utility. To prove the point, I generated one possible useful output (an HTML table) using my current mildly broken code, and went through and fixed the output by hand. (Sigh.) See https://bauman.zapto.org/~syd/temp/4TEICouncil/GHPRs.html.
Talk to y’all tomorrow. _______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org
Hi Elisa — Yes, I mentioned XPath, and I am not surprised it whetted your appetite even at 01:00 in the freakin’ morning! 🙂 However, it’s not the code[0] that’s the problem, it’s the data. If you ask GitHub for all the PRs,[1] convert to XML,[2] grab all the PR numbers,[3] and ask for the reviewers for each of those,[4] and convert them to XML,[2] you now have a large pile of ugly XML data.[5] I have not figured out a consistent way to extract reviewer user names (which are encoded as <string key="login"> elements) from that pile. (Note that reviewers can appear both in the main set of PR info and the separate little array of review info.) Just to save anyone interested the trouble, I have put the pile of data I got yesterday in [6]. P.S. Elisa: what are you teaching this semester? Notes [0] Which is XSLT, not Python. I have not written Python in over a decade. And I was bad at it. [1] https://api.github.com/repos/TEIC/TEI/pulls plus https://api.github.com/repos/TEIC/Stylesheets/pulls [2] Presuming here with just json-to-xml(), but actually I have taken an extra step after that to give me slightly more XML-like data. (And then another step to give me real XML data, but a) that is not part of this discussion, and b) it is not finished.) [3] …/fn:array/fn:map/fn:number[@key eq 'number']; and be careful, because you do NOT want the …/fn:array/fn:map/fn:map/fn:number[@key eq 'number'] values. [4] https://api.github.com/repos/TEIC/(TEI|Stylesheets)/pulls/$N/requested_reviewers. [5] You want to save this data on disk, rather than ask for it again every time you want to play, because GitHub limits you to 60 requests per hour unless you pay them. [6] http://bauman.zapto.org/~syd/temp/4TEICouncil/GitHub_PR_data_2021-01-14.tgz ________________________________ You mentioned XPath? I'd be curious to see the rest of the code. I've just been writing some Python myself to address the Box.com API to rescue some old project metadata from a big set of nested file directories and I'm half shocked that I got my code to work. I do wonder whether Python might be easier to address the GitHub API. Fair warning: classes start up again for me on 1/19, but I'd be happy to take a look anyway! ;-)
Hi Syd: I suddenly noticed that I didn't notice you'd replied 7 days ago!
That is pretty ridiculous of me, but has everything to do with the launch
of spring semester.
In answer to your question, I am teaching a class called "Text Analysis"
which is something like my old class called "Coding and Data
Visualization", in that I think of it as my XQuery class. I'm also cramming
a few weeks of Python into it, which is new teaching ground for me, though
not new coding ground. Unlike those who have taught this course in the past
as an entirely Python course, my course will involve an intersection of
structured markup, regex conversion of so-called "plain-text" to XML at
scale, and pull-processing of lightly marked XML into stuff like TSVs,
maybe JSON (ugh, but maybe yeah), and visualizing the data in various ways
such as networks and SVG plots. And I want to explore a little NLTK and
topic modeling, and just play around with Pythonic things on whatever
documents we've heaped up around us by March / April. It's my first time
with this class, so we'll see where it goes!
Anyway, I wouldn't mind taking a peak at your XSLT to see if there's any
suggestions for making the pile of XML output on the other side a little
less hideous! But I am swamped--this is the sort of thing I'd enjoy looking
at during a real old-fashioned face-to-face TEI session before we all
wander over the restaurant for dinner!
Cheers,
Elisa
On Thu, Jan 14, 2021 at 7:48 AM Bauman, Syd
Hi Elisa — Yes, I mentioned XPath, and I am not surprised it whetted your appetite even at 01:00 in the freakin’ morning! 🙂
However, it’s not the code[0] that’s the problem, it’s the data. If you ask GitHub for all the PRs,[1] convert to XML,[2] grab all the PR numbers,[3] and ask for the reviewers for each of those,[4] and convert them to XML,[2] you now have a large pile of ugly XML data.[5] I have not figured out a consistent way to extract reviewer user names (which are encoded as <string key="login"> elements) from that pile. (Note that reviewers can appear both in the main set of PR info and the separate little array of review info.)
Just to save anyone interested the trouble, I have put the pile of data I got yesterday in [6].
P.S. Elisa: what are you teaching this semester?
*Notes* [0] Which is XSLT, not Python. I have not written Python in over a decade. And I was bad at it. [1] https://api.github.com/repos/TEIC/TEI/pulls plus https://api.github.com/repos/TEIC/Stylesheets/pulls [2] Presuming here with just json-to-xml(), but actually I have taken an extra step after that to give me slightly more XML-like data. (And then another step to give me real XML data, but a) that is not part of this discussion, and b) it is not finished.) [3] …/fn:array/fn:map/fn:number[@key eq 'number']; and be careful, because you do NOT want the …/fn:array/fn:map/fn:map/fn:number[@key eq 'number'] values. [4] https://api.github.com/repos/TEIC/(TEI|Stylesheets)/pulls/$N/requested_reviewers . [5] You want to save this data on disk, rather than ask for it again every time you want to play, because GitHub limits you to 60 requests per hour unless you pay them. [6] http://bauman.zapto.org/~syd/temp/4TEICouncil/GitHub_PR_data_2021-01-14.tgz
------------------------------ You mentioned XPath? I'd be curious to see the rest of the code. I've just been writing some Python myself to address the Box.com API to rescue some old project metadata from a big set of nested file directories and I'm half shocked that I got my code to work. I do wonder whether Python might be easier to address the GitHub API. Fair warning: classes start up again for me on 1/19, but I'd be happy to take a look anyway! ;-)
_______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org
Dear Elisa,
Happy to look at this problem!
N
From: Tei-council
participants (3)
-
Bauman, Syd
-
Elisa Beshero-Bondar
-
Nicholas Cole