When using the FileToGoogleCloudStorageOperator from Airflow I keep getting this error when running my dag:
"FileNotFoundError: [Errno 2] No such file or directory: '/Users/ramonsotogarcia/Desktop/Data/pokemon.csv"
I do not understand why Airflow is not finding my local file. This is my dag:
from datetime import timedelta
from airflow.contrib.operators.file_to_gcs import FileToGoogleCloudStorageOperator
from airflow.contrib.operators.gcs_to_bq import GoogleCloudStorageToBigQueryOperator
from airflow.utils.dates import days_ago
#define variables
file = "pokemon.csv"
bucket = "modulo_spark_bucket"
destination_path = f"gs://{bucket}/data/{file}"
bucket = f"gs://{bucket}"
local_file = f"/Users/ramonsotogarcia/Desktop/Data/{file}"
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=5),
}
my_dag = DAG(
'fileSystem_toGCS_toBQ',
default_args=default_args,
description='Loads data from local file into GCS and then transfer to BQ',
schedule_interval=None,
)
t1 = FileToGoogleCloudStorageOperator(task_id = "local_to_gcs",
src = local_file,
dst = destination_path,
bucket = bucket,
dag = my_dag)
t2 = GoogleCloudStorageToBigQueryOperator(task_id = "GCS_to_BQ",
bucket = bucket,
source_objects = [destination_path],
autodetect = True,
skip_leading_rows = 1,
create_disposition = "CREATE_IF_NEEDED",
destination_project_dataset_table = "neural-theory-277009.pokemon_data.pokemons",
dag = my_dag)
#dependencies
t1 >> t2
Any ideas? I cannot seem to figure out what is wrong.
unie :
Just a guess but have you tried specifying the drive in the filepath, i.e.\nf"C:/Users/ramonsotogarcia/Desktop/Data/{file}"\n\nIn case your airflow and the csv are on different disks.",
2020-11-10T15:17:45