Home:ALL Converter>Python - write x rows of csv file to json file

Python - write x rows of csv file to json file

Ask Time:2016-08-17T00:05:23         Author:dan martin

Json Formatter

I have a csv file, which I need to write to json files in rows of 1000. The csv file has around 9,000 rows, so ideally I'd like to end up with 9 separate json files of consecutive data.

I know how to write a csv file to json - what I've been doing:

csvfile = open("C:\\Users\Me\Desktop\data\data.csv", 'r', encoding="utf8")

reader = csv.DictReader(csvfile, delimiter = ",")
out = json.dumps( [ row for row in reader ] )

with open("C:\\Users\Me\Desktop\data\data.json", 'w') as f:
f.write(out)

which works great. But I need the json file to be 9 split files. Now, I'm assuming that I would either:

1) attempt to count row and stop when it reaches 1,000

2) write the csv file to a single json file, then open the json and attempt to split it somehow.

I'm pretty lost on how to accomplish this - any help appreciated!

Author:dan martin,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/38979603/python-write-x-rows-of-csv-file-to-json-file
Alicia Garcia-Raboso :

Read the whole CSV file into a list or rows, then write slices of length 1000 to JSON files.\n\nimport csv\nimport json\n\ninput_file = 'C:\\\\Users\\\\Me\\\\Desktop\\\\data\\\\data.csv'\noutput_file_template = 'C:\\\\Users\\\\Me\\\\Desktop\\\\data\\\\data_{}.json'\n\nwith open(input_file, 'r', encoding='utf8') as csvfile:\n reader = csv.DictReader(csvfile, delimiter=',')\n rows = list(reader)\n\nfor i in range(len(rows) // 1000):\n out = json.dumps(rows[1000*i:1000*(i+1)])\n with open(output_file_template.format(i), 'w') as f:\n f.write(out)\n",
2016-08-16T16:28:23
Laurent LAPORTE :

Instead of reading the whole CSV file, you can iterate (less memory usage).\n\nFor instance, here is a simple iteration of the rows:\n\nwith open(input_file, 'r', encoding='utf8') as csvfile:\n reader = csv.DictReader(csvfile, delimiter=',')\n for row in reader:\n print(row)\n\n\nDuring iteration, you can enumerate the rows and use this value to count the groups of 1000 rows:\n\ngroup_size = 1000\n\nwith open(input_file, 'r', encoding='utf8') as csvfile:\n reader = csv.DictReader(csvfile, delimiter=',')\n for index, row in enumerate(reader):\n group_idx = index // group_size\n print(group_idx, row)\n\n\nYou should have something like this:\n\n0 [row 0...]\n0 [row 1...]\n0 [row 2...]\n...\n0 [row 999...]\n1 [row 1000...]\n1 [row 1001...]\netc.\n\n\nYou can use itertools.groupby to group yours rows by 1000.\n\nUsing Alberto Garcia-Raboso's solution, you can use:\n\nfrom __future__ import division\n\nimport csv\nimport json\nimport itertools\n\ninput_file = 'C:\\\\Users\\\\Me\\\\Desktop\\\\data\\\\data.csv'\noutput_file_template = 'C:\\\\Users\\\\Me\\\\Desktop\\\\data\\\\data_{}.json'\n\ngroup_size = 1000\n\nwith open(input_file, 'r', encoding='utf8') as csvfile:\n reader = csv.DictReader(csvfile, delimiter=',')\n for key, group in itertools.groupby(enumerate(rows),\n key=lambda item: item[0] // group_size):\n grp_rows = [item[1] for item in group]\n content = json.dumps(grp_rows)\n with open(output_file_template.format(key), 'w') as jsonfile:\n jsonfile.write(content)\n\n\nExemple with some fake data:\n\nfrom __future__ import division\nimport itertools\n\nrows = [[1, 2], [3, 4], [5, 6], [7, 8],\n [1, 2], [3, 4], [5, 6], [7, 8],\n [1, 2], [3, 4], [5, 6], [7, 8],\n [1, 2], [3, 4], [5, 6], [7, 8],\n [1, 2], [3, 4], [5, 6], [7, 8]]\n\ngroup_size = 4\nfor key, group in itertools.groupby(enumerate(rows),\n key=lambda item: item[0] // group_size):\n g_rows = [item[1] for item in group]\n print(key, g_rows)\n\n\nYou'll get:\n\n0 [[1, 2], [3, 4], [5, 6], [7, 8]]\n1 [[1, 2], [3, 4], [5, 6], [7, 8]]\n2 [[1, 2], [3, 4], [5, 6], [7, 8]]\n3 [[1, 2], [3, 4], [5, 6], [7, 8]]\n4 [[1, 2], [3, 4], [5, 6], [7, 8]]\n",
2016-08-16T16:58:42
Padraic Cunningham :

There is no reason to use a Dictreader, the regular csv.reader will do fine. You can also just use itertool.islice on the reader object to slice the data into n rows and dump each collection to a new file:\n\nfrom itertools import islice, count\nimport csv\nimport json \n\nwith open(\"C:\\\\Users\\Me\\Desktop\\data\\data.csv\") as f:\n reader, cnt = csv.reader(f), count(1)\n for rows in iter(lambda: list(islice(reader, 1000)), []):\n with open(\"C:\\\\Users\\Me\\Desktop\\data\\data{}.json\".format(next(cnt))) as out:\n json.dump(rows, out)\n",
2016-08-16T22:39:12
yy