Transverse the directory structure using Python

Overview

Hi all, I remember how scared I was when I started coding for my MS project. But slowly I was able to solve a lot of coding problems. I wish there were some good resources that helped me in my project (Maybe there were resources but I was not good at googling then! 😛).

In most of my projects, the difficult part was to write a custom dataloader for fetching data stored with a different directory structure. Sometimes I needed to get the directory structure and store it in a text file for further processing. All such task requires transversing the data directory. Here I have a snippet for the same and it will make me happy that I was able to help some code.

# Insert the path to the data directory
root = "<dataset-root-path>"

# os.walk is a function that will transverse the directory. It is a 
# generator that will transverse to all the sub-directories.
# Note: os.scandir is faster not I'm Not sure how fast as I never tried it.

for (dirpath, dirnames, filenames) in os.walk(root):
    # dirpath: it the path for the (sub-)directory where os.walk is pointing
    # dirnames: it is the list of sub-directories in dirpath
    # filenames: it list all the files present in the "dirpath"
    
    # Always check if there are any files in the the dirpath
    # Proceed if there are any files, in case you are performing any operation
    # on files
    if len(filenames):
        for name in filenames:
            ## <your-code-here> ##

            # To get the absolute path of file use the following.
            # Remember dirpath is the current sub-directory and filenames is
            # a list of files in the current sub-directory.
            absolute_path = os.path.join(dirpath, name)           
            
    
    
Akash Gupta
Akash Gupta
Senior Machine Learning Scientist

My research interests include computer vision amd machine learning applications in object detection and video enhancement.