Message from Python discussions
November 2018
— Fixed the if == statements
Hi. I am using xml.etree.ElementTree
to parse docbook XML. But it appeared that there are HTML special character entities inside my file, for instance ë
Now I see the following exception raised during parse()
:
xml.etree.ElementTree.ParseError: undefined entity ë: line 7191, column 64
What is the proper way to avoid such errors?
— What do you want to do exactly? verify that response is text?
— Im a little lost here
— Try and except
— Yeah.. the if self.ignore_markdown: will trigger the bool and return the bool for content_markup (checks if the text has markup format)
— If is True return false
— Looks like its finding a foreign language caharacter, maybe check the encoding?
— The same with the html one
— It founds ë
in input XML stream. It is known beast (http://www.thesauruslex.com/typo/eng/enghtml.htm), so nothing wrong with it. I would like it to be converted to appropriate Unicode code.
— Encoding then
— Encoding is specified in the input file as for every valid XML:
<?xml version="1.0" encoding="utf-8" ?>