I have a MongoDB collection which is having updatedtimestamp column. I am passing from and to date using an external yaml file. But updated timestamp column is in the UNIX timestamp.
Any suggestions on this? Or anyway to bring the latest records based on updatedtimestamp which is captured in unixtimestamp?
data format
id:1223344444444
createTimestamp:1640845784796
updateTimestamp:1640845784796
deleteTimestamp:null
uuid:"abcdefgh"
YAML file
collection_name:
incremental: true
date_field: updateTimestamp
start_date: 2022-01-01
end_date: 2022-01-02
hours_interval: 24
Incremental query
# Build interval dates
initial_dt_t = datetime.combine(day, datetime.min.time()) + timedelta(
minutes=lower_limit
)
final_dt_t = datetime.combine(day, datetime.min.time()) + timedelta(
minutes=upper_limit
)
initial_dt = mktime(initial_dt_t.timetuple())
final_dt = mktime(final_dt_t.timetuple())
print("initial-dt")
print(initial_dt_t)
print(final_dt_t)
print("unix")
print(initial_dt)
print(final_dt)
# Connect to MongoDB
mongo_client = MongoClient(MONGODB_URI)
mongo_db = mongo_client[MONGODB_NAME]
collection = mongo_db[COLLECTION_NAME]
# Build aggregation pipeline for MongoDB
or_list = []
print("coming to query")
# Check if there is a single field
if "date_field" in config.keys():
or_list.append(
{
config["date_field"]: {
"$gte": initial_dt,
"$lt": final_dt,
}
}
)
# Check if there are multiple date fields
if "date_fields" in config.keys():
for date_field in config["date_fields"]:
or_list.append(
{
config["date_field"]: {
"$gte": initial_dt,
"$lt": final_dt,
}
}
)
# Get data from MongoDB
query = {"$or": or_list}
# Keep track of batch number
batch_number = 0
print("chunk")
data = pd.DataFrame(list(collection.find(query)))
print("dataframe")
print(data)