Home:ALL Converter>Extracting data from pandas data frame to a new data frame

Extracting data from pandas data frame to a new data frame

Ask Time:2022-09-21T05:03:53         Author:Harsh780

Json Formatter

I have a data structure in my code which is a dictionary of dictionaries. The nested dictionary has all the keys as pandas data frames. Basically, I had multiple excel files with multiple tabs and columns, so I created this data structure as I wanted to further do some modeling on this data. Now, I want to extract two columns from one specific tab of each excel file(if they are present in that file) and print them in a new master data frame. I tried some routines but was not able to get the expected result. Please find below the code that I tried to resolve this issue.

def text_extraction_to_dataframe(dict1, process_key):
    '''This routine is used to extract any required column from the data into a new dataframe with the file name as new
    column attached to it'''        
    
    #Initializing new data frame
    df = pd.DataFrame()
    df['ExcelFile'] = ''
    
    #Running nested for-loops to get into our data structure(dictionary of dictionaries)
    for key, value in dict1.items():
                    
        for key1, value1 in value.items():

            #Checking if the required tab matches to the key
            if key1 == process_key:
                    
                df = pd.DataFrame(value1)   #Extracting all the data from the tab to the new dataframe

                df['ExcelFile'] = key.split('.')[0]  #Appending the data frame with new column as the filename
        
    #Removing unnecessary columns from the data frame and only keeping column3 and column4
    df = df.drop(columns = ['colum_1', 'column2']) 
    return df

text_extraction_to_dataframe(dictionary, 'tab_name')

This routine is not extracting all the data from all the columns of each excel file.

Also, I want to get the last column of the master data frame as the excel file name.

Basically, the structure of master df will be [column3, column4, excelfilename]

Let me know if you need anything else other than this. Any help would be appreciated.

Author:Harsh780,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/73792612/extracting-data-from-pandas-data-frame-to-a-new-data-frame
yy