Completed test migration of SourceForge FRs and bugs
Dear all, First of all apologies if you have just been spammed by GitHub notifications :( I've tested migrating to https://github.com/TEIC/Guidelines-TEST/issues Feature Requests and bugs as exported by Hugh on July 28. That's a total of 564 FRs and 764 bugs. I'm happy to report that all have been imported successfully. This was done using the perl script gosf2gh, which I've adjusted a bit to work with our requirements. The code of my fork is here: https://github.com/raffazizzi/gosf2github Feel free to contribute! A few comments: - All issues and comments were created by raffazizzi because I used my API token. When we do this for real, we should use a TEITechnicalCouncil token. - Even though all issues and comments will appear created by the same user, the script carries over the original creator / commentator in the body of the issue/comment. If we have a GitHub user for it, the GitHub user will appear, otherwise the old username will. - Referencing users: I've avoided adding explicit references to GitHub users (eg @raffazizzi) because the script I used warns that too much spam could cause my token to be banned. The advantage of referencing users directly is that issues can be searched by mentioned users, but we may have to give this one up. - Issue assignment: if the ticket was not assigned in SF, it won't be assigned in GH. If it was assigned to a user that is currently not a collaborator, it will be assigned to TEITechnicalCouncil. A label is also added with the old sourceforge assignee in the form of sf_assignee-USER. I'm now realizing that it may be more useful to have these labels for the original *creators* not the assignees. Thoughts? - Formatting of the text looks pretty good, but code blocks are a bit broken. The script guesses by matching ~~~~ blocks, but sometimes they're marked wit more tildes. I can try to fix this, but it's probably going to be a bit unreliable. Not a big deal, though, IMO. If you notice any other formatting problems, let me know! Raff
This is awesome Raff! Very encouraging.
On Aug 20, 2015, at 17:40 , Raffaele Viglianti
wrote: Dear all,
First of all apologies if you have just been spammed by GitHub notifications :(
I didn’t get any :-). Presumably I would if I’d been referenced.
I've tested migrating to https://github.com/TEIC/Guidelines-TEST/issues Feature Requests and bugs as exported by Hugh on July 28. That's a total of 564 FRs and 764 bugs. I'm happy to report that all have been imported successfully.
This was done using the perl script gosf2gh, which I've adjusted a bit to work with our requirements. The code of my fork is here: https://github.com/raffazizzi/gosf2github Feel free to contribute!
A few comments:
- All issues and comments were created by raffazizzi because I used my API token. When we do this for real, we should use a TEITechnicalCouncil token.
Yep.
- Even though all issues and comments will appear created by the same user, the script carries over the original creator / commentator in the body of the issue/comment. If we have a GitHub user for it, the GitHub user will appear, otherwise the old username will.
Any chance we could hoist this to the top of the comment? It’s a little confusing to have to go look for the original commenter at the bottom. Wonder if we could improve on that by matching the SF username to a real name where we have one...
- Referencing users: I've avoided adding explicit references to GitHub users (eg @raffazizzi) because the script I used warns that too much spam could cause my token to be banned. The advantage of referencing users directly is that issues can be searched by mentioned users, but we may have to give this one up.
I’d like to find out more about this if we can. Maybe we could ask GitHub support how much is too much. Possibly if we throttle the migration?
- Issue assignment: if the ticket was not assigned in SF, it won't be assigned in GH. If it was assigned to a user that is currently not a collaborator, it will be assigned to TEITechnicalCouncil. A label is also added with the old sourceforge assignee in the form of sf_assignee-USER. I'm now realizing that it may be more useful to have these labels for the original *creators* not the assignees. Thoughts?
I dunno…I suppose it might be more useful to be able to find tickets by creator. I worry a bit about the profusion of labels
- Formatting of the text looks pretty good, but code blocks are a bit broken. The script guesses by matching ~~~~ blocks, but sometimes they're marked wit more tildes. I can try to fix this, but it's probably going to be a bit unreliable. Not a big deal, though, IMO. If you notice any other formatting problems, let me know!
Maybe change the regex at https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99 https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99 to something like \~{4.}?
Raff -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
On 8/20/15 8:43 PM, Hugh Cayless wrote:
This is awesome Raff! Very encouraging.
On Aug 20, 2015, at 17:40 , Raffaele Viglianti
wrote: Dear all,
First of all apologies if you have just been spammed by GitHub notifications :(
I didn’t get any :-). Presumably I would if I’d been referenced.
I also didn't receive these, and I think the regex I was using was too restrictive. I've loosened it, so I hope it will work going forward. Kevin
Hello,
I've been making some improvements to the migration based on your feedback.
Differences from before:
- attributes referred as @att are now escaped (GH uses @ to refer to user
names)
- code blocks should work properly now
- removed unnecessary escape \ that littered most older tickets and messed
up URLs
- labels are based on tickets creator not, assignee
Two more questions for you:
1. do we really want labels for original creators? PRO: they make it easy
to look for tickets based on the creator's legacy user name (not all
creators will be on GitHub)
2. the SF username of the original assignee is added as an extra comment to
the issue. This is useful particularly if the assignee is not a current
contributor in GitHub: this would become the only place where this is
recorded. This comment ends up at the bottom because it's given a creation
date of when it's actually created. Hugh suggested to move this extra
comment *at the top* of all comments. To do this I need to fudge the
creation date to be before the first comment. Are we comfortable with this?
I've been testing on my own fork, but I'll do another full test on
TEIC/Guidelines-TEST once I've made the last few changes. After that, I
think we'll be ready to go.
Thanks!
Raff
On Thu, Aug 20, 2015 at 9:43 PM, Hugh Cayless
This is awesome Raff! Very encouraging.
On Aug 20, 2015, at 17:40 , Raffaele Viglianti < raffaeleviglianti@gmail.com> wrote:
Dear all,
First of all apologies if you have just been spammed by GitHub notifications :(
I didn’t get any :-). Presumably I would if I’d been referenced.
I've tested migrating to https://github.com/TEIC/Guidelines-TEST/issues Feature Requests and bugs as exported by Hugh on July 28. That's a total of 564 FRs and 764 bugs. I'm happy to report that all have been imported successfully.
This was done using the perl script gosf2gh, which I've adjusted a bit to work with our requirements. The code of my fork is here: https://github.com/raffazizzi/gosf2github Feel free to contribute!
A few comments:
- All issues and comments were created by raffazizzi because I used my API token. When we do this for real, we should use a TEITechnicalCouncil token.
Yep.
- Even though all issues and comments will appear created by the same
user,
the script carries over the original creator / commentator in the body of the issue/comment. If we have a GitHub user for it, the GitHub user will appear, otherwise the old username will.
Any chance we could hoist this to the top of the comment? It’s a little confusing to have to go look for the original commenter at the bottom. Wonder if we could improve on that by matching the SF username to a real name where we have one...
- Referencing users: I've avoided adding explicit references to GitHub users (eg @raffazizzi) because the script I used warns that too much spam could cause my token to be banned. The advantage of referencing users directly is that issues can be searched by mentioned users, but we may
have
to give this one up.
I’d like to find out more about this if we can. Maybe we could ask GitHub support how much is too much. Possibly if we throttle the migration?
- Issue assignment: if the ticket was not assigned in SF, it won't be assigned in GH. If it was assigned to a user that is currently not a collaborator, it will be assigned to TEITechnicalCouncil. A label is also added with the old sourceforge assignee in the form of sf_assignee-USER. I'm now realizing that it may be more useful to have these labels for the original *creators* not the assignees. Thoughts?
I dunno…I suppose it might be more useful to be able to find tickets by creator. I worry a bit about the profusion of labels
- Formatting of the text looks pretty good, but code blocks are a bit broken. The script guesses by matching ~~~~ blocks, but sometimes they're marked wit more tildes. I can try to fix this, but it's probably going to be a bit unreliable. Not a big deal, though, IMO. If you notice any other formatting problems, let me know!
Maybe change the regex at https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99 < https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99> to something like \~{4.}?
Raff -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
Hi Raff, This sounds great. Thanks for all this work. I'm OK with adding a spurious comment at the top of the comment list (i.e. before the rest of the comments) containing the assignee. As long as it's in a specific form which makes clear that it's not really a comment, but a note about assignment in SourceForge, it will be helpful, not confusing. Cheers, Martin On 15-09-10 02:26 PM, Raffaele Viglianti wrote:
Hello,
I've been making some improvements to the migration based on your feedback.
Differences from before: - attributes referred as @att are now escaped (GH uses @ to refer to user names) - code blocks should work properly now - removed unnecessary escape \ that littered most older tickets and messed up URLs - labels are based on tickets creator not, assignee
Two more questions for you:
1. do we really want labels for original creators? PRO: they make it easy to look for tickets based on the creator's legacy user name (not all creators will be on GitHub) 2. the SF username of the original assignee is added as an extra comment to the issue. This is useful particularly if the assignee is not a current contributor in GitHub: this would become the only place where this is recorded. This comment ends up at the bottom because it's given a creation date of when it's actually created. Hugh suggested to move this extra comment *at the top* of all comments. To do this I need to fudge the creation date to be before the first comment. Are we comfortable with this?
I've been testing on my own fork, but I'll do another full test on TEIC/Guidelines-TEST once I've made the last few changes. After that, I think we'll be ready to go.
Thanks! Raff
On Thu, Aug 20, 2015 at 9:43 PM, Hugh Cayless
wrote: This is awesome Raff! Very encouraging.
On Aug 20, 2015, at 17:40 , Raffaele Viglianti < raffaeleviglianti@gmail.com> wrote:
Dear all,
First of all apologies if you have just been spammed by GitHub notifications :(
I didn’t get any :-). Presumably I would if I’d been referenced.
I've tested migrating to https://github.com/TEIC/Guidelines-TEST/issues Feature Requests and bugs as exported by Hugh on July 28. That's a total of 564 FRs and 764 bugs. I'm happy to report that all have been imported successfully.
This was done using the perl script gosf2gh, which I've adjusted a bit to work with our requirements. The code of my fork is here: https://github.com/raffazizzi/gosf2github Feel free to contribute!
A few comments:
- All issues and comments were created by raffazizzi because I used my API token. When we do this for real, we should use a TEITechnicalCouncil token.
Yep.
- Even though all issues and comments will appear created by the same
user,
the script carries over the original creator / commentator in the body of the issue/comment. If we have a GitHub user for it, the GitHub user will appear, otherwise the old username will.
Any chance we could hoist this to the top of the comment? It’s a little confusing to have to go look for the original commenter at the bottom. Wonder if we could improve on that by matching the SF username to a real name where we have one...
- Referencing users: I've avoided adding explicit references to GitHub users (eg @raffazizzi) because the script I used warns that too much spam could cause my token to be banned. The advantage of referencing users directly is that issues can be searched by mentioned users, but we may
have
to give this one up.
I’d like to find out more about this if we can. Maybe we could ask GitHub support how much is too much. Possibly if we throttle the migration?
- Issue assignment: if the ticket was not assigned in SF, it won't be assigned in GH. If it was assigned to a user that is currently not a collaborator, it will be assigned to TEITechnicalCouncil. A label is also added with the old sourceforge assignee in the form of sf_assignee-USER. I'm now realizing that it may be more useful to have these labels for the original *creators* not the assignees. Thoughts?
I dunno…I suppose it might be more useful to be able to find tickets by creator. I worry a bit about the profusion of labels
- Formatting of the text looks pretty good, but code blocks are a bit broken. The script guesses by matching ~~~~ blocks, but sometimes they're marked wit more tildes. I can try to fix this, but it's probably going to be a bit unreliable. Not a big deal, though, IMO. If you notice any other formatting problems, let me know!
Maybe change the regex at https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99 < https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99> to something like \~{4.}?
Raff -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
Hello,
I made another pass. This time you might have gotten some spam because I've
introduced direct user mentions with "@".
Two things:
1. It's not possible to add a comment with original assignee information at
the top: even with a made up early date (say sometime before the first
SourceForge ticket in 2004), the comment always shows second to the comment
that opens the issue. I could attempt to add extra text to the earliest
comment, but I'd rather just have a dedicated comment at the end, with a
real date of when it was created.
2. GitHub didn't always use markdown and posts dated before a certain date
(April 20 2009 at 19:00:00) are marked in Textile. This means that user
references with @ actually won't work there. I don't think this is a major
issue, but I wanted to let you know.
Otherwise, I think my testing phase is now concluded, and I'm ready to
proceed with copying the data over once and for all. We could discuss a
possible freeze at our next call.
The migrated tickets (FRs and bugs up to roughly July this year) are here
for your review: https://github.com/TEIC/TEI-TEST/issues
As well as the migration code: https://github.com/raffazizzi/gosf2github
Raff
On Thu, Sep 10, 2015 at 6:44 PM, Martin Holmes
Hi Raff,
This sounds great. Thanks for all this work.
I'm OK with adding a spurious comment at the top of the comment list (i.e. before the rest of the comments) containing the assignee. As long as it's in a specific form which makes clear that it's not really a comment, but a note about assignment in SourceForge, it will be helpful, not confusing.
Cheers, Martin
On 15-09-10 02:26 PM, Raffaele Viglianti wrote:
Hello,
I've been making some improvements to the migration based on your feedback.
Differences from before: - attributes referred as @att are now escaped (GH uses @ to refer to user names) - code blocks should work properly now - removed unnecessary escape \ that littered most older tickets and messed up URLs - labels are based on tickets creator not, assignee
Two more questions for you:
1. do we really want labels for original creators? PRO: they make it easy to look for tickets based on the creator's legacy user name (not all creators will be on GitHub) 2. the SF username of the original assignee is added as an extra comment to the issue. This is useful particularly if the assignee is not a current contributor in GitHub: this would become the only place where this is recorded. This comment ends up at the bottom because it's given a creation date of when it's actually created. Hugh suggested to move this extra comment *at the top* of all comments. To do this I need to fudge the creation date to be before the first comment. Are we comfortable with this?
I've been testing on my own fork, but I'll do another full test on TEIC/Guidelines-TEST once I've made the last few changes. After that, I think we'll be ready to go.
Thanks! Raff
On Thu, Aug 20, 2015 at 9:43 PM, Hugh Cayless
wrote: This is awesome Raff! Very encouraging.
On Aug 20, 2015, at 17:40 , Raffaele Viglianti <
raffaeleviglianti@gmail.com> wrote:
Dear all,
First of all apologies if you have just been spammed by GitHub notifications :(
I didn’t get any :-). Presumably I would if I’d been referenced.
I've tested migrating to https://github.com/TEIC/Guidelines-TEST/issues
Feature Requests and bugs as exported by Hugh on July 28. That's a total
of
564 FRs and 764 bugs. I'm happy to report that all have been imported successfully.
This was done using the perl script gosf2gh, which I've adjusted a bit to work with our requirements. The code of my fork is here: https://github.com/raffazizzi/gosf2github Feel free to contribute!
A few comments:
- All issues and comments were created by raffazizzi because I used my
API
token. When we do this for real, we should use a TEITechnicalCouncil
token.
Yep.
- Even though all issues and comments will appear created by the same
user,
the script carries over the original creator / commentator in the body of the issue/comment. If we have a GitHub user for it, the GitHub user will appear, otherwise the old username will.
Any chance we could hoist this to the top of the comment? It’s a little confusing to have to go look for the original commenter at the bottom. Wonder if we could improve on that by matching the SF username to a real name where we have one...
- Referencing users: I've avoided adding explicit references to GitHub users (eg @raffazizzi) because the script I used warns that too much spam could cause my token to be banned. The advantage of referencing users directly is that issues can be searched by mentioned users, but we may
have
to give this one up.
I’d like to find out more about this if we can. Maybe we could ask GitHub support how much is too much. Possibly if we throttle the migration?
- Issue assignment: if the ticket was not assigned in SF, it won't be assigned in GH. If it was assigned to a user that is currently not a collaborator, it will be assigned to TEITechnicalCouncil. A label is also added with the old sourceforge assignee in the form of sf_assignee-USER. I'm now realizing that it may be more useful to have these labels for the original *creators* not the assignees. Thoughts?
I dunno…I suppose it might be more useful to be able to find tickets by creator. I worry a bit about the profusion of labels
- Formatting of the text looks pretty good, but code blocks are a bit broken. The script guesses by matching ~~~~ blocks, but sometimes they're marked wit more tildes. I can try to fix this, but it's probably going to be a bit unreliable. Not a big deal, though, IMO. If you notice any other formatting problems, let me know!
Maybe change the regex at https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99 < https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99
to something like \~{4.}?
Raff -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
This looks great Raff. Thanks! Let’s indeed talk about it at the next call. Doodle (http://doodle.com/poll/4fbdufgwsyc8ruc3 http://doodle.com/poll/4fbdufgwsyc8ruc3) says the best candidate dates are Thursday the 24th and Monday the 28th. A few of you haven’t participated yet, so can you lets us know if one of those won’t work. I’m slightly inclined towards the earlier date, given that we have a freeze coming up and need to decide what’s in or out for the next release.
On Sep 16, 2015, at 16:54 , Raffaele Viglianti
wrote: Hello,
I made another pass. This time you might have gotten some spam because I've introduced direct user mentions with "@".
Two things:
1. It's not possible to add a comment with original assignee information at the top: even with a made up early date (say sometime before the first SourceForge ticket in 2004), the comment always shows second to the comment that opens the issue. I could attempt to add extra text to the earliest comment, but I'd rather just have a dedicated comment at the end, with a real date of when it was created.
2. GitHub didn't always use markdown and posts dated before a certain date (April 20 2009 at 19:00:00) are marked in Textile. This means that user references with @ actually won't work there. I don't think this is a major issue, but I wanted to let you know.
Otherwise, I think my testing phase is now concluded, and I'm ready to proceed with copying the data over once and for all. We could discuss a possible freeze at our next call.
The migrated tickets (FRs and bugs up to roughly July this year) are here for your review: https://github.com/TEIC/TEI-TEST/issues As well as the migration code: https://github.com/raffazizzi/gosf2github
Raff
On Thu, Sep 10, 2015 at 6:44 PM, Martin Holmes
wrote: Hi Raff,
This sounds great. Thanks for all this work.
I'm OK with adding a spurious comment at the top of the comment list (i.e. before the rest of the comments) containing the assignee. As long as it's in a specific form which makes clear that it's not really a comment, but a note about assignment in SourceForge, it will be helpful, not confusing.
Cheers, Martin
On 15-09-10 02:26 PM, Raffaele Viglianti wrote:
Hello,
I've been making some improvements to the migration based on your feedback.
Differences from before: - attributes referred as @att are now escaped (GH uses @ to refer to user names) - code blocks should work properly now - removed unnecessary escape \ that littered most older tickets and messed up URLs - labels are based on tickets creator not, assignee
Two more questions for you:
1. do we really want labels for original creators? PRO: they make it easy to look for tickets based on the creator's legacy user name (not all creators will be on GitHub) 2. the SF username of the original assignee is added as an extra comment to the issue. This is useful particularly if the assignee is not a current contributor in GitHub: this would become the only place where this is recorded. This comment ends up at the bottom because it's given a creation date of when it's actually created. Hugh suggested to move this extra comment *at the top* of all comments. To do this I need to fudge the creation date to be before the first comment. Are we comfortable with this?
I've been testing on my own fork, but I'll do another full test on TEIC/Guidelines-TEST once I've made the last few changes. After that, I think we'll be ready to go.
Thanks! Raff
On Thu, Aug 20, 2015 at 9:43 PM, Hugh Cayless
wrote: This is awesome Raff! Very encouraging.
On Aug 20, 2015, at 17:40 , Raffaele Viglianti <
raffaeleviglianti@gmail.com> wrote:
Dear all,
First of all apologies if you have just been spammed by GitHub notifications :(
I didn’t get any :-). Presumably I would if I’d been referenced.
I've tested migrating to https://github.com/TEIC/Guidelines-TEST/issues
Feature Requests and bugs as exported by Hugh on July 28. That's a total
of
564 FRs and 764 bugs. I'm happy to report that all have been imported successfully.
This was done using the perl script gosf2gh, which I've adjusted a bit to work with our requirements. The code of my fork is here: https://github.com/raffazizzi/gosf2github Feel free to contribute!
A few comments:
- All issues and comments were created by raffazizzi because I used my
API
token. When we do this for real, we should use a TEITechnicalCouncil
token.
Yep.
- Even though all issues and comments will appear created by the same
user,
the script carries over the original creator / commentator in the body of the issue/comment. If we have a GitHub user for it, the GitHub user will appear, otherwise the old username will.
Any chance we could hoist this to the top of the comment? It’s a little confusing to have to go look for the original commenter at the bottom. Wonder if we could improve on that by matching the SF username to a real name where we have one...
- Referencing users: I've avoided adding explicit references to GitHub users (eg @raffazizzi) because the script I used warns that too much spam could cause my token to be banned. The advantage of referencing users directly is that issues can be searched by mentioned users, but we may
have
to give this one up.
I’d like to find out more about this if we can. Maybe we could ask GitHub support how much is too much. Possibly if we throttle the migration?
- Issue assignment: if the ticket was not assigned in SF, it won't be assigned in GH. If it was assigned to a user that is currently not a collaborator, it will be assigned to TEITechnicalCouncil. A label is also added with the old sourceforge assignee in the form of sf_assignee-USER. I'm now realizing that it may be more useful to have these labels for the original *creators* not the assignees. Thoughts?
I dunno…I suppose it might be more useful to be able to find tickets by creator. I worry a bit about the profusion of labels
- Formatting of the text looks pretty good, but code blocks are a bit broken. The script guesses by matching ~~~~ blocks, but sometimes they're marked wit more tildes. I can try to fix this, but it's probably going to be a bit unreliable. Not a big deal, though, IMO. If you notice any other formatting problems, let me know!
Maybe change the regex at https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99 < https://github.com/raffazizzi/gosf2github/blob/master/gosf2github.pl#L99
to something like \~{4.}?
Raff -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
participants (4)
-
Hugh Cayless
-
Kevin Hawkins
-
Martin Holmes
-
Raffaele Viglianti