November 2018

Hi. I am using xml.etree.ElementTree to parse docbook XML. But it appeared that there are HTML special character entities inside my file, for instance ë Now I see the following exception raised during parse():

xml.etree.ElementTree.ParseError: undefined entity ë: line 7191, column 64

What is the proper way to avoid such errors?

— Try and except

— Looks like its finding a foreign language caharacter, maybe check the encoding?

— It founds ë in input XML stream. It is known beast (, so nothing wrong with it. I would like it to be converted to appropriate Unicode code.

— Encoding then

— Encoding is specified in the input file as for every valid XML:

<?xml version="1.0" encoding="utf-8" ?>

