Home:ALL Converter>Jobs running on driver node databrick

Jobs running on driver node databrick

Ask Time:2022-03-11T11:28:41         Author:hi4ppl

Json Formatter

I have notebook that runs a machine learning jobs in databrick, i'm using dbutils to accept variables and pass that to notebook.

I have created another notebook as parent and pass variables via this notebook and run multiple notebook with.

ThreadPoolExecutor, ProcessPoolExecutor
def processAnIntegerNumber(id):

dbutils.notebook.run(path = "/Users/child_notebook",
                                    timeout_seconds = 3600,
                                    arguments = {"id":id})

this will create multiple jobs, the problem that I have is all these 10-30 jobs that I pass as variable runs in Driver nodes and does not use worker nodes and as result it's extremely slow.

anyway to run python notebook in parallel, without using scala?

regards

Author:hi4ppl,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/71433316/jobs-running-on-driver-node-databrick
yy