Updating Online Spreadsheets

Updating Online Spreadsheets#

This script will demonstrate how to upload the contents of a DataFrame into a Google Sheets file using Python’s gspread and gspread-dataframe libraries. This is a convenient option for sharing your output with others, especially if you need to update that output on a regular basis.

The Google Sheets worksheet that this script will update can be found at https://docs.google.com/spreadsheets/d/17aDJ3mg49-n0IEnDgN7ZB85pO87fiUpkZPULYDB8dmo/edit?usp=sharing .

Since the National Weather Service (NWS) weather data accessed by this script generally gets updated on an hourly basis, it makes sense to have this script run automatically every hour. That way, your Google Sheets data, along with your local .csv copies of historical weather data, will always be relatively up to date.

If you’d like to try automating this script, I recommend saving it as a .py file (which is easy to do within JupyterLab Desktop); setting up a script that will run this file; and then having your computer run this script on an hourly basis. (More information on accomplishing these steps within Linux can be found at the end of this notebook.)

Prerequisites#

Python’s gspread library provides a very helpful overview of connecting to Google Sheets workbooks via a service account at https://docs.gspread.org/en/latest/oauth2.html#for-bots-using-service-account . (There are multiple ways to connect to workbooks, but I find the service account approach to be pretty straightforward.) Go ahead and complete the steps described in this link if you haven’t already.

(Note: I chose to save my service account key to a custom path rather than the default one specified in the documentation; that way, I could more easily work with multiple account keys on my computer.) Also make sure that, prior to enabling the Google Sheets and Google Drive APIs, the Google Cloud project that you want to use for this script has been selected.
If you also specified a custom path, you’ll need to create a file called ‘service_key_path.txt’ within this folder that points to it. That way, the following cell will be able to read in its location for use within gspread functions. (Alternatively, you could simply replace the following cell’s code with service_key_path = (path_to_your_key). Or, if you’re using the default key location, you can comment out this cell altogether.)
As noted in the gspread service account documentation, you’ll need to give the email associated with your service account editor access to the Google Sheets workbook that you’d like your script to update. (This email is not the same as your regular Google Email; it will likely look something like accountname@cloudprojectname.iam.gserviceaccount.com. You can find it within your service account file.

Ensuring that this script (when running on my own laptop) will have access to a comprehensive set of weather data#

The following cell checks whether this script is running on my own laptop. (If yours happens to have the name ‘kjb3-lm’, which seems very unlikely, it will run on that machine also.) If so, it copies a new set of weather data from a Google Drive folder; if not, this task will be skipped.

For an explanation of why I added in this cell, reference the ‘Updating my laptop’s copy of the files in weather_data with my server’s copy’ section of this notebook’s appendix.

import platform
if platform.node() == 'kjb3-lm':
    # The use of platform.node() here to check whether
    # the script is running on my computer came from
    # Eric Palakovich Carr at
    # https://stackoverflow.com/a/799799/13097194 .
    # If you have a program running on multiple 
    # computers, but need to have different computers
    # perform different steps at times, consider 
    # having your code check the output of 
    # platform.node(), then respond to this output
    # as needed.
    print("Copying the latest weather_data contents \
from the server.")
    import shutil
    shutil.copytree('/home/kjb3/kjb3server_drive/weather_data',
                    'weather_data', dirs_exist_ok=True)  
    

Copying the latest weather_data contents from the server.

Importing relevant libraries#

import sys
sys.path.insert(1, '../Appendix')
from helper_funcs import config_notebook, wadi
display_type = config_notebook(display_max_columns = 5)
# Specifying which columns to render within the output:
display_cols = ['Station', 'Date/Time', 'Temp', 
                '1-Hour Precip', 'Rolling 24-Hour Precip']

with open('service_key_path.txt') as file:
    service_key_path = file.read()

import gspread
from gspread_dataframe import set_with_dataframe, get_as_dataframe
# From https://pypi.org/project/gspread-dataframe/
# gspread_dataframe isn't available within conda-forge as of 2025-02-20;
# therefore, you'll need to install it via the following pip command:
# pip install gspread-dataframe

import pandas as pd

# The following weather_import() function will be used to retrieve
# recent weather data from the National Weather Service website.
from weather_import import weather_import

Initializing gspread:#

(If you’re using gspread’s default path, you can comment out the first line in the following cell, then uncomment the following one.)

gc = gspread.service_account(filename=service_key_path)

# gc = gspread.service_account()

# (This code comes from
# # From https://docs.gspread.org/en/latest/index.html)

Importing weather data#

The following cells will import weather data for three Virginia airport weather stations: KCHO (Charlottesville-Albemarle airport); KIAD (Dulles International Airport); and KOKV (Winchester Regional Airport). Once these imports are complete, the script will combine each dataset with pre-existing data, then save updated copies of these tables to .csv files. The code that manages this import process can be found within ‘weather_import.py’ (located within the same folder as this notebook).

data_folder = 'weather_data'

weather_import(
    station_code = 'KCHO',
    data_folder = data_folder)

Original length of historical data file: 4644
New length of historical data file: 4712

weather_import(
    station_code = 'KIAD',
    data_folder = data_folder)

Original length of historical data file: 4243
New length of historical data file: 4314

# Note: KOKV's data appears to be recorded at 
# 20-minute intervals rather than hourly ones.)

weather_import(
    station_code = 'KOKV',
    data_folder = data_folder)

Original length of historical data file: 4207
New length of historical data file: 4280

Reading these datasets into DataFrames#

The weather_import() function called in the preceding cells saves its output to local .csv files. The following cells import those .csv files into DataFrames so that their data can get uploaded into a public Google Sheets workbook.

df_weather_kcho = pd.read_csv(
    data_folder+'/'+'KCHO'+'_'
    +'historical_hourly_data_updated.csv')
df_weather_kcho[display_cols].tail()

	Station	Date/Time	Temp	Rolling 24-Hour Precip
4707	KCHO	2025-06-20 17:53	82.0	0.04
4708	KCHO	2025-06-20 18:53	82.0	0.04
4709	KCHO	2025-06-20 19:53	79.0	0.04
4710	KCHO	2025-06-20 20:53	75.0	0.04
4711	KCHO	2025-06-20 21:53	72.0	0.00

df_weather_kiad = pd.read_csv(
    data_folder+'/'+'KIAD'+'_'
    +'historical_hourly_data_updated.csv')
df_weather_kiad[display_cols].tail()

	Station	Date/Time	Temp
4309	KIAD	2025-06-20 17:52	84.9
4310	KIAD	2025-06-20 18:52	82.9
4311	KIAD	2025-06-20 19:52	79.0
4312	KIAD	2025-06-20 20:52	77.0
4313	KIAD	2025-06-20 21:52	75.0

df_weather_kokv = pd.read_csv(
    data_folder+'/'+'KOKV'+'_'
    +'historical_hourly_data_updated.csv')
df_weather_kokv[display_cols].tail()

	Station	Date/Time	Temp
4275	KOKV	2025-06-20 18:55	78.8
4276	KOKV	2025-06-20 19:55	78.8
4277	KOKV	2025-06-20 20:55	71.6
4278	KOKV	2025-06-20 21:55	69.8
4279	KOKV	2025-06-20 22:35	69.8

Importing these DataFrames into a Google Sheets workbook#

In order to export these datasets to a Google Sheets workbook, we’ll first need to open that workbook with gspread. There are a few ways to do this (see https://docs.gspread.org/en/latest/user-guide.html#opening-a-spreadsheet for reference), but I like the open_by_key() option, which–as the function’s name suggests–allows you to open a workbook using its key.

These keys are located within the center of each workbook URL. For instance, the full URL of the workbook I’ll be updating is https://docs.google.com/spreadsheets/d/17aDJ3mg49-n0IEnDgN7ZB85pO87fiUpkZPULYDB8dmo/edit?usp=sharing, so the key–located in between the /d/ component of that URL and the following /–is 17aDJ3mg49-n0IEnDgN7ZB85pO87fiUpkZPULYDB8dmo.

wb = gc.open_by_key('17aDJ3mg49-n0IEnDgN7ZB85pO87fiUpkZPULYDB8dmo')
# Based on 
# https://docs.gspread.org/en/latest/user-guide.html#opening-a-spreadsheet
wb

<Spreadsheet 'Hourly VA Weather Data' id:17aDJ3mg49-n0IEnDgN7ZB85pO87fiUpkZPULYDB8dmo>

Next, I’ll select the ‘KCHO’ worksheet within this workbook, as that’s the first one I’d like to update. I’ll also clear out the current contents using ws.clear(); that way, only the latest DataFrame contents will appear within the spreadsheet after I call set_with_dataframe below. (If the most recent DataFrame is smaller than the previous copy, parts of the earlier version might still appear unless it’s first removed using ws.clear().

ws = wb.worksheet('KCHO')
# https://docs.gspread.org/en/latest/user-guide.html#opening-a-spreadsheet
ws.clear() 

{'spreadsheetId': '17aDJ3mg49-n0IEnDgN7ZB85pO87fiUpkZPULYDB8dmo',
 'clearedRange': 'KCHO!A1:AC1672'}

Finally, I’ll call set_with_dataframe to export df_weather_kcho to the KCHO Google Sheets worksheet.

Only the most recent 960 rows (representing 40 days’ worth of data if no entries were missing) will get exported to Google Sheets. This will limit the time (and potentially money) needed to import this data into a Dash app (kburchfiel/pfn) that utilizes it.

More extensive sets of weather data, however, can be found within the weather_data/ subfolder of the Updating_Online_Spreadsheets folder that contains this notebook. You can also access this data online at https://drive.google.com/drive/folders/1s9sUHa9HoYwpFrb_BsUN42CWXusp-7bD?usp=sharing .

set_with_dataframe(ws, df_weather_kcho.iloc[-960:])
# Source: https://pypi.org/project/gspread-dataframe/

In order to confirm that this upload was successful, we can call get_as_dataframe to import the contents of the worksheet into a new DataFrame:

df_weather_from_ws = get_as_dataframe(ws)
df_weather_from_ws[display_cols].tail()

	Station	Date/Time	Temp	Rolling 24-Hour Precip
955	KCHO	2025-06-20 17:53	82.0	0.04
956	KCHO	2025-06-20 18:53	82.0	0.04
957	KCHO	2025-06-20 19:53	79.0	0.04
958	KCHO	2025-06-20 20:53	75.0	0.04
959	KCHO	2025-06-20 21:53	72.0	0.00

Performing the same data export steps for KIAD and KOKV data#

ws = wb.worksheet('KIAD')
ws.clear() 
set_with_dataframe(ws, df_weather_kiad.iloc[-960:])

ws = wb.worksheet('KOKV')
ws.clear() 
set_with_dataframe(ws, df_weather_kokv.iloc[-960:])

Conclusion#

The gpsread and gspread-dataframe libraries make it easy to regularly update Google Sheets files with new figures, thus allowing coworkers and/or the general public to access the latest versions of your data. (For guidance on automating this update process, make sure to consult the first part of this notebook’s Appendix.) In addition, Google Sheets workbooks can serve as data sources for interactive Dash apps–which the next section of Python for Nonprofits will introduce.

Appendix#

A shell script and crontab entry for running this notebook automatically#

(These steps were written for Linux environments, but similar steps can be performed on Windows; you’d just need to write a batch script rather than a shell script and use Windows Task Scheduler instead of your cron editor. I’m quite sure that it’s possible to automate this script on Macs as well, but I’ll leave that as an exercise for the reader.)

Your computer is more than happy to run a .py equivalent of this script, rain or shine, every hour of the day (as long as it’s kept powered on, of course). This section will teach you how to do so!

Step 1: Creating your shell script#

First, you’ll need to create a shell script that activates your Python environment; navigates to the folder containing the .py version of this notebook*; and then runs that file.

The following cell shows what this shell script looks like on my ‘server,’ which is really just an old laptop that I keep powered on 24/7 in order to run these hourly weather imports. (I gave the script the imaginative name online_spreadsheet_update.sh.)

This shell script also copies the weather_data files that it creates to a Google Drive folder; this makes it easier to transfer them to my main laptop and also keeps the files backed up.

Since this server, like my main laptop, runs Linux, I needed to mount the drive via RClone so that it could be accessed via my local file system; otherwise, the cp command would not have worked. Steps for configuring RClone to connect to a Google Drive account are avaialble at https://rclone.org/drive/ . Also, in case it helps, my shell script for mounting my server’s Google Drive account to my main laptop is as follows:

rclone mount kjb3server: ~/kjb3server_drive/ --vfs-cache-mode full (This code was based on https://ostechnix.com/install-rclone-in-linux/ and Andre’s comment on it. --vfs-cache-mode full allows RClone to access files more quickly.)

*(To create a .py version of this notebook, open it within JupyterLab Desktop; navigate to File –> Save and Export Notebook As –> Executable Script; and then save it (preferably within the same folder as updating_online_spreadsheets.ipynb) as updating_online_spreadsheets.py. You could also just run the .ipynb file directly if you’d prefer; see the comments within the following script for more details.)

#!/bin/bash

# It appears that the line above needs to be the first entry within this script.

# For a discussion of the above line,
# see: https://stackoverflow.com/questions/8967902/why-do-you-need-to-put-bin-bash-at-the-beginning-of-a-script-file

echo "Activating Python environment:"

# Activating my custom Python for Nonprofits environment:
# (You may be able to delete the following two lines if you're planning
# to execute the notebook within your base environment.)
# These lines are based on Lamma's post at:
# https://stackoverflow.com/a/60523131/13097194

source ~/miniforge3/etc/profile.d/conda.sh
conda activate kjb3server

# Navigating to the folder that hosts this script:

cd /home/kjb3lxms/kjb3python/Updating_Online_Spreadsheets

# Executing the Python script:

python updating_online_spreadsheets.py

# Copying weather_data folder over to the KJB3Server Google Drive folder for backup and transfer purposes:

cp -r weather_data /home/kjb3lxms/kjb3server_drive/

echo "Finished running script."
sleep 2

Step 2: Adding a corresponding crontab entry#

In order to instruct your computer to run this script 10 minutes after each hour, you can then run crontab -e within your terminal and paste the following line at the bottom of the page. (You’ll of course need to replace my path with your own path to this file.)

10 * * * * /home/kjb3lxms/kjb3python/Updating_Online_Spreadsheets/online_spreadsheet_update.sh

(If you’re new to crontabs, make sure to review the documentation that appears at the top of your crontab file. Also, if you’re working within the Nano editor, make sure to hit Ctrl + X to exit out of the editor after your updates have been saved; if you instead close the window, your changes will be lost.)

Updating my laptop’s copy of the files in the weather_data/ folder with my server’s copy#

As noted earlier in the appendix, my ‘server’ runs a Python version of this script each hour, thus keeping my Google Sheets copy of recent weather data up to date. However, when I run this notebook on my main laptop, this comprehensive Google Sheets copy will get overwritten by a more spotty copy of weather data on my local computer. (Since the script can only receive the most recent 3 days of weather data from the NWS, and I only run this script on my main laptop occassionally, my local recent_weather/ folder will likely be missing quite a few hours of data.)

In order to prevent this issue, I added code at the start of this notebook that copies the latest contents of the weather_data/ folder within my server to the weather_data/ folder within my main laptop. That way, this script will have access to the same comprehensive weather data file that the server does–and, as a result, will load a more comprehensive set of data to Google Sheets.

In order for this code to work successfully, I needed to use RClone to mount my server’s Google Drive folder to my main laptop. (A simple shell script that accomplishes this mount can be found earlier within this appendix. I’ve added this script to my Startup Applications list within Linux Mint on both my main laptop and my server, thus ensuring that both computers will always be connected to it.) Having this Google Drive folder available on both computers makes the process of copying weather data from my server to my main laptop much easier.