How to read all the files inside a folder into a single dataframe

Have a bunch of csv files and don’t want to import them all individually? Well, I can’t blame you, and you shouldn’t anyways.

The function below reads all the files within a single folder and returns it as one Pandas dataframe. Of course, for it to work the files need to be of the same data structure. In other words, they all got to have the same column names in the same order.

# we import these first
# put them in the beginning of your file
import os
import pandas as pd

def read_data(folder):
    This function reads each the raw data files as dataframes and
    combines them into a single data frame.
    for i, file_name in enumerate(os.listdir(input_folder)):
            # df = pd.read_excel(os.path.join(input_folder, file_name)) # excel
            # df = pd.read_csv(os.path.join(input_folder, file_name), sep='\t') # tsv file
            df = pd.read_csv(os.path.join(input_folder, file_name)) # vanilla csv
            df['file_name'] = file_name
            if i == 0:
                final_df = df.copy()
                final_df = final_df.append(df)

        except Exception as e:
            print(f"Cannot read file: {file_name}")
    return final_df

# provide the folder path where your files reside
folder = 'G:/path/to/data/parent_folder_name'

# call the function
df = read_data(folder)

Published by

Ednalyn C. De Dios

I’ve always been enamored with code and I love data science because of its inherent power to solve real problems. Having grown up in the Philippines, served in the United States Navy, and worked in the nonprofit sector, I am driven to make the world a better place. I have started and participated in numerous campaigns that aim to reduce domestic violence and child abuse in the community.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.