Home:ALL Converter>Jobs running on driver node databrick

Jobs running on driver node databrick

Ask Time:2022-03-11T11:28:41         Author:hi4ppl

Json Formatter

I have notebook that runs a machine learning jobs in databrick, i'm using dbutils to accept variables and pass that to notebook.

I have created another notebook as parent and pass variables via this notebook and run multiple notebook with.

ThreadPoolExecutor, ProcessPoolExecutor
def processAnIntegerNumber(id):

dbutils.notebook.run(path = "/Users/child_notebook",
                                    timeout_seconds = 3600,
                                    arguments = {"id":id})

this will create multiple jobs, the problem that I have is all these 10-30 jobs that I pass as variable runs in Driver nodes and does not use worker nodes and as result it's extremely slow.

anyway to run python notebook in parallel, without using scala?


Author:hi4ppl,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/71433316/jobs-running-on-driver-node-databrick