I have notebook that runs a machine learning jobs in databrick, i'm using dbutils to accept variables and pass that to notebook.
I have created another notebook as parent and pass variables via this notebook and run multiple notebook with.
ThreadPoolExecutor, ProcessPoolExecutor
def processAnIntegerNumber(id):
dbutils.notebook.run(path = "/Users/child_notebook",
timeout_seconds = 3600,
arguments = {"id":id})
this will create multiple jobs, the problem that I have is all these 10-30 jobs that I pass as variable runs in Driver nodes and does not use worker nodes and as result it's extremely slow.
anyway to run python notebook in parallel, without using scala?
regards