Hi all, At the last Stylesheets meeting, we checked how the command "xmllint --format -" that is run when testing the HTML transformations adds the XML declaration to the output file (which was a bit counterproductive because the XSLT code was explicitly demanding the omission of the declaration). Now I’ve run into another problem that is also generated due to the xmllint command: it outputs XML entities when a literal string is expected. I run the script without xmllint and the output is the expected one so it seems that it is another of the modifications done by this program. This issue primarily affects the quotation marks that are added in the HTML output using CSS. See diff: 17c16 < content:"‘"; ---
content:"‘";
20c19 < content:"’"; ---
content:"’";
23c22 < content:"“"; ---
content:"“";
26c25 < content:"“"; My guess is that this wasn’t an issue before because xmllint had as an input a file with the XML declaration. Now it finds a file whose first line is the HTML doctype declaration and thus adds these modifications. Considering that in the HTML transformation Saxon already has the command to indent the results, my proposal is to modify Test/Makefile and delete the "xmllint" command ONLY for the transformations to HTML. Would this be a viable solution? Best, H. Helena Bermúdez Sabel Chercheuse FNS senior Institut des sciences du langage Université de Neuchâtel
Hi Helena and Council, Now I know what I missed by arriving late to the Stylesheets meeting on Thursday! I am glad you've figured out that xmllint is causing these specific issues with the HTML transformation. I think it makes sense to do as you propose and remove xmllint for only the transformations to HTML. I have a question about the conversion of xml entities generally in the xmllint program: Do we need them for the other transformations, or is this unnecessary now? A casual search of the Stylesheets issues for xmllint yields just three issues, in each of which it seems people invoke xmllint --format just for indenting when running Stylesheets via Oxgarage...is that typically the use-case for our community using the Stylesheets? I wonder if the entity transformation is potentially problematic elsewhere, or what the need for it is. Elisa On Sun, Apr 25, 2021 at 4:18 AM BERMUDEZ SABEL Helena < helena.bermudez@unine.ch> wrote:
Hi all,
At the last Stylesheets meeting, we checked how the command "xmllint --format -" that is run when testing the HTML transformations adds the XML declaration to the output file (which was a bit counterproductive because the XSLT code was explicitly demanding the omission of the declaration).
Now I’ve run into another problem that is also generated due to the xmllint command: it outputs XML entities when a literal string is expected. I run the script without xmllint and the output is the expected one so it seems that it is another of the modifications done by this program. This issue primarily affects the quotation marks that are added in the HTML output using CSS. See diff:
17c16 < content:"‘"; ---
content:"‘";
20c19 < content:"’"; ---
content:"’";
23c22 < content:"“"; ---
content:"“";
26c25 < content:"“";
My guess is that this wasn’t an issue before because xmllint had as an input a file with the XML declaration. Now it finds a file whose first line is the HTML doctype declaration and thus adds these modifications.
Considering that in the HTML transformation Saxon already has the command to indent the results, my proposal is to modify Test/Makefile and delete the "xmllint" command ONLY for the transformations to HTML. Would this be a viable solution?
Best,
H.
Helena Bermúdez Sabel Chercheuse FNS senior Institut des sciences du langage Université de Neuchâtel
_______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org
Also, I am wondering whether setting the --html or --htmlout flag on xmllint is any help.
From what I'm reading, it seems that --format doesn't really work for HTML, but I wonder if --htmlout is a solution. I am guessing it doesn't really matter as this xmllint step seems unnecessary for the HTML transformations? (Or what benefit would it have?)
Elisa
On Sun, Apr 25, 2021 at 1:28 PM Elisa Beshero-Bondar
Hi Helena and Council, Now I know what I missed by arriving late to the Stylesheets meeting on Thursday! I am glad you've figured out that xmllint is causing these specific issues with the HTML transformation. I think it makes sense to do as you propose and remove xmllint for only the transformations to HTML.
I have a question about the conversion of xml entities generally in the xmllint program: Do we need them for the other transformations, or is this unnecessary now? A casual search of the Stylesheets issues for xmllint yields just three issues, in each of which it seems people invoke xmllint --format just for indenting when running Stylesheets via Oxgarage...is that typically the use-case for our community using the Stylesheets? I wonder if the entity transformation is potentially problematic elsewhere, or what the need for it is.
Elisa
On Sun, Apr 25, 2021 at 4:18 AM BERMUDEZ SABEL Helena < helena.bermudez@unine.ch> wrote:
Hi all,
At the last Stylesheets meeting, we checked how the command "xmllint --format -" that is run when testing the HTML transformations adds the XML declaration to the output file (which was a bit counterproductive because the XSLT code was explicitly demanding the omission of the declaration).
Now I’ve run into another problem that is also generated due to the xmllint command: it outputs XML entities when a literal string is expected. I run the script without xmllint and the output is the expected one so it seems that it is another of the modifications done by this program. This issue primarily affects the quotation marks that are added in the HTML output using CSS. See diff:
17c16 < content:"‘"; ---
content:"‘";
20c19 < content:"’"; ---
content:"’";
23c22 < content:"“"; ---
content:"“";
26c25 < content:"“";
My guess is that this wasn’t an issue before because xmllint had as an input a file with the XML declaration. Now it finds a file whose first line is the HTML doctype declaration and thus adds these modifications.
Considering that in the HTML transformation Saxon already has the command to indent the results, my proposal is to modify Test/Makefile and delete the "xmllint" command ONLY for the transformations to HTML. Would this be a viable solution?
Best,
H.
Helena Bermúdez Sabel Chercheuse FNS senior Institut des sciences du langage Université de Neuchâtel
_______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org
Hi Helena and all, I think the benefit of linting the files (and thus the original intention) is to produce a „normalized“ (predictable) version for better comparison, suppressing formatting differences. So, I just want to add two more options: a) If the linting behaves differently now due to the changed doctype we might just adjust the expected results once and keep the current linter. This assumes that we really don’t care about the format of the files but rather want them to be linted in a predictable manner b) we might look into other linters that work better(?) with html Best Peter
Am 25.04.2021 um 22:05 schrieb Elisa Beshero-Bondar
: Also, I am wondering whether setting the --html or --htmlout flag on xmllint is any help. From what I'm reading, it seems that --format doesn't really work for HTML, but I wonder if --htmlout is a solution. I am guessing it doesn't really matter as this xmllint step seems unnecessary for the HTML transformations? (Or what benefit would it have?)
Elisa
On Sun, Apr 25, 2021 at 1:28 PM Elisa Beshero-Bondar
wrote: Hi Helena and Council, Now I know what I missed by arriving late to the Stylesheets meeting on Thursday! I am glad you've figured out that xmllint is causing these specific issues with the HTML transformation. I think it makes sense to do as you propose and remove xmllint for only the transformations to HTML. I have a question about the conversion of xml entities generally in the xmllint program: Do we need them for the other transformations, or is this unnecessary now? A casual search of the Stylesheets issues for xmllint yields just three issues, in each of which it seems people invoke xmllint --format just for indenting when running Stylesheets via Oxgarage...is that typically the use-case for our community using the Stylesheets? I wonder if the entity transformation is potentially problematic elsewhere, or what the need for it is.
Elisa
On Sun, Apr 25, 2021 at 4:18 AM BERMUDEZ SABEL Helena
wrote: Hi all,
At the last Stylesheets meeting, we checked how the command "xmllint --format -" that is run when testing the HTML transformations adds the XML declaration to the output file (which was a bit counterproductive because the XSLT code was explicitly demanding the omission of the declaration).
Now I’ve run into another problem that is also generated due to the xmllint command: it outputs XML entities when a literal string is expected. I run the script without xmllint and the output is the expected one so it seems that it is another of the modifications done by this program. This issue primarily affects the quotation marks that are added in the HTML output using CSS. See diff:
17c16 < content:"‘"; ---
content:"‘";
20c19 < content:"’"; ---
content:"’";
23c22 < content:"“"; ---
content:"“";
26c25 < content:"“";
My guess is that this wasn’t an issue before because xmllint had as an input a file with the XML declaration. Now it finds a file whose first line is the HTML doctype declaration and thus adds these modifications.
Considering that in the HTML transformation Saxon already has the command to indent the results, my proposal is to modify Test/Makefile and delete the "xmllint" command ONLY for the transformations to HTML. Would this be a viable solution?
Best,
H.
Helena Bermúdez Sabel Chercheuse FNS senior Institut des sciences du langage Université de Neuchâtel
_______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org _______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
Hello all,
Thank you for the feedback, Peter. I am partial to your option b.https://www.html-tidy.org/#homepage19700201what_is_tidy
I wonder if we could add this to the agenda of the next Stylesheets meeting to decide what’s the best option. I’ve been doing some tests using Tidyhttps://www.html-tidy.org/#homepage19700201what_is_tidy for the HTML outputs (and it works well) so that will be my proposal.
https://www.html-tidy.org/#homepage19700201what_is_tidyBest,
H.
Helena Bermúdez Sabel
Chercheuse FNS senior
Institut des sciences du langage
Université de Neuchâtel
________________________________
From: Peter Stadler
Am 25.04.2021 um 22:05 schrieb Elisa Beshero-Bondar
: Also, I am wondering whether setting the --html or --htmlout flag on xmllint is any help. From what I'm reading, it seems that --format doesn't really work for HTML, but I wonder if --htmlout is a solution. I am guessing it doesn't really matter as this xmllint step seems unnecessary for the HTML transformations? (Or what benefit would it have?)
Elisa
On Sun, Apr 25, 2021 at 1:28 PM Elisa Beshero-Bondar
wrote: Hi Helena and Council, Now I know what I missed by arriving late to the Stylesheets meeting on Thursday! I am glad you've figured out that xmllint is causing these specific issues with the HTML transformation. I think it makes sense to do as you propose and remove xmllint for only the transformations to HTML. I have a question about the conversion of xml entities generally in the xmllint program: Do we need them for the other transformations, or is this unnecessary now? A casual search of the Stylesheets issues for xmllint yields just three issues, in each of which it seems people invoke xmllint --format just for indenting when running Stylesheets via Oxgarage...is that typically the use-case for our community using the Stylesheets? I wonder if the entity transformation is potentially problematic elsewhere, or what the need for it is.
Elisa
On Sun, Apr 25, 2021 at 4:18 AM BERMUDEZ SABEL Helena
wrote: Hi all,
At the last Stylesheets meeting, we checked how the command "xmllint --format -" that is run when testing the HTML transformations adds the XML declaration to the output file (which was a bit counterproductive because the XSLT code was explicitly demanding the omission of the declaration).
Now I’ve run into another problem that is also generated due to the xmllint command: it outputs XML entities when a literal string is expected. I run the script without xmllint and the output is the expected one so it seems that it is another of the modifications done by this program. This issue primarily affects the quotation marks that are added in the HTML output using CSS. See diff:
17c16 < content:"‘"; ---
content:"‘";
20c19 < content:"’"; ---
content:"’";
23c22 < content:"“"; ---
content:"“";
26c25 < content:"“";
My guess is that this wasn’t an issue before because xmllint had as an input a file with the XML declaration. Now it finds a file whose first line is the HTML doctype declaration and thus adds these modifications.
Considering that in the HTML transformation Saxon already has the command to indent the results, my proposal is to modify Test/Makefile and delete the "xmllint" command ONLY for the transformations to HTML. Would this be a viable solution?
Best,
H.
Helena Bermúdez Sabel Chercheuse FNS senior Institut des sciences du langage Université de Neuchâtel
_______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org
-- Elisa Beshero-Bondar, PhD Program Chair of Digital Media, Arts, and Technology | Professor of Digital Humanities | Director of the Digital Humanities Lab at Penn State Erie, The Behrend College Development site: https://newtfire.org _______________________________________________ Tei-council mailing list Tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
participants (3)
-
BERMUDEZ SABEL Helena
-
Elisa Beshero-Bondar
-
Peter Stadler