I am working on html parser, it uses Python multiprocessing Pool, because it runs through huge number of pages. The output from every page is saved to a separate CSV file. The problem is sometimes I get unexpected error and whole program crashes and I have errors handling almost everywhere - reading pages, parsing pages, even writing files. Moreover it looks like the script crashes after it finishes writing a batch of files, so it shouldn't be anything to crush on. Thus after whole day of debugging I am left clueless.
Error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ppp\Python\parser\run.py", line 244, in media_process
save_media_product(DIRECTORY, category, media_data)
File "D:\ppp\Python\parser\manage_output.py", line 180, in save_media_product
_file_manager(target_file, temp, temp2)
File "D:\ppp\Python\store_parser\manage_output.py", line 214, in _file_manager
file_to_write.close()
UnboundLocalError: local variable 'file_to_write' referenced before assignment
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\ppp\Python\store_parser\run.py", line 356, in <module>
main()
File "D:\Rzeczy Mariusza\Python\store_parser\run.py", line 318, in main
process.map(media_process, batch)
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 644, in get
raise self._value
UnboundLocalError: local variable 'file_to_write' referenced before assignment
It look like, there is an error with variable assignment, but it is not:
try:
file_to_write = open(target_file, 'w')
except OSError:
message = 'OSError while writing file name - {}'.format(target_file)
log_error(message)
except UnboundLocalError:
message = 'UnboundLocalError while writing file name - {}'.format(target_file)
log_error(message)
except Exception as e:
message = 'Total failure "{}" while writing file name - {}'.format(e, target_file)
log_error(message)
else:
file_to_write.write(temp)
file_to_write.write(temp2)
finally:
file_to_write.close()
Line - except Exception as e:
, does not help with anything, the whole thing still crashes. So far i have excluded only Out Of Memory scenario, because this script is designed to be handled on low spec VPS, but in testing stage I run it in environment with 8 GB of ram. So if You have any theories please share.