python 3.x - stanford-dependency parser with NLTK :UnicodeDecodeError: -
i trying run following lines of code:
import os os.environ['javahome'] = 'path/to/java.exe' os.environ['stanford_parser'] = 'path/to/stanford-parser.jar' os.environ['stanford_models'] = 'path/to/stanford-parser-3.8.0-models.jar' nltk.parse.stanford import stanforddependencyparser dep_parser = stanforddependencyparser(model_path="path/to/englishpcfg.ser.gz") sentence = "sample sentence ..." # dependency parsing: print("dependency parsing:") print([parse.tree() parse in dep_parser.raw_parse(sentence)])
and @ line:
print([parse.tree() parse in dep_parser.raw_parse(sentence)])
i following issues:
traceback (most recent call last): file "c:/users/norbert/pycharmprojects/untitled/stanforddependencyparser.py", line 21, in print([parse.tree() parse in dep_parser.raw_parse(sentence)]) file "c:\users\norbert\appdata\local\programs\python\python36\lib\site-packages\nltk\parse\stanford.py", line 134, in raw_parse return next(self.raw_parse_sents([sentence], verbose)) file "c:\users\norbert\appdata\local\programs\python\python36\lib\site-packages\nltk\parse\stanford.py", line 152, in raw_parse_sents return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose)) file "c:\users\norbert\appdata\local\programs\python\python36\lib\site-packages\nltk\parse\stanford.py", line 218, in _execute stdout=pipe, stderr=pipe) file "c:\users\norbert\appdata\local\programs\python\python36\lib\site-packages\nltk\internals.py", line 135, in java print(_decode_stdoutdata(stderr)) file "c:\users\norbert\appdata\local\programs\python\python36\lib\site-packages\nltk\internals.py", line 737, in _decode_stdoutdata return stdoutdata.decode(encoding) unicodedecodeerror: 'utf-8' codec can't decode byte 0xac in position 3097: invalid start byte
any idea wrong ? not dealing non-utf-8 text.
Comments
Post a Comment