Index | Me | Site | Blog | Tech | Links | Media
Last Update: 2024-02-29 21:50 UTC+0

Bash Basics - Timestamp HTML Pages

Back

I like timestamping all of my website's pages to indicate when they were last updated, but don't have the time to type in dates manually.

Luckily this can easily be achieved with a simple Bash script making use of some default commands & the sed utility.

The script takes 1 input, the path of the HTML file to be timestamped, for example ./timestamp.sh ./index.html The timestamp will be stored in a standalone div tag with a unique id which the script will write to.

To start off, we can initialize a variable indicating the div's id. This helps to make the script more portable.

#!/bin/bash

# div id of the "last update" element
divname="lastupdate"

A few checks are ran to ensure that the correct number of parameters (1) have been provided, otherwise a reminder of how to run the script is echoed

## Get specified page from user input
# Check if a filename was provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <filename>"
    exit 1
fi

The filename variable is set & the script checks whether the input is not a valid file - If so, it throws an error and exits.

# filename user input
filename=$1
# Ensure the file exists
if [ ! -f "$filename" ]; then
    echo "File not found: $filename"
    exit 1
fi

Next, we generate the timestamp itself. I like to format dates in the ISO 8601 standard (YYYY-MM-DD) and include the current timezone, so the standard Linux date command will be slightly re-formatted to include the UTC offset and a check for daylight saving time using grep:

## Generate a timestamp
# Get the current date and time
currentDate=$(date '+%Y-%m-%d %H:%M')
# Retrieve the local timezone
localZone=$(date '+%Z')
# Determine if the current date is in DST - grep counts whether DST is in the output
isDaylightSaving=$(date '+%Z' | grep -c DST)
# Get the current UTC offset in hours and minutes
utcOffset=$(date '+%:z')

The UTC offset needs to be formatted accordingly. 0 is converted to UTC+0, otherwise it's formatted based on the above variable.

# Initialize the variable to hold the formatted offset
utcOffsetFormatted=""
# Check if the offset is zero (UTC)
if [ "$utcOffset" == "+00:00" ]; then
    utcOffsetFormatted="UTC+0"
else
    # Format the offset with "UTC +/- Offset"
    utcOffsetFormatted="UTC$utcOffset"
fi
# Store the full timestamp in a variable
timestamp="$currentDate $utcOffsetFormatted"

We can use one more variable to create the text to be written to the HTML page.

# Define the variable with the content to be inserted
newcontent="Last Update: $timestamp"

Lastly, sed is used to find the div tag and overwrite it using regex to include the timestamp. The sed command consists of two parts, the regex and the file to be run. In our case, it's the $filename variable from the beginning of the script.

## Apply timestamp to specified page
# Use sed to replace the content of div
sed -i "/<div id=\"$divname\">/,/<\/div>/c\<div id=\"$divname\">\n$newcontent\n</div>" "$filename"

Let's break this regex down a bit as it's quite a lot to take in. Note that backslashes are used to escape special characters; They need to be matched literally in the text rather than being applied to the overall command.

/<div id=\"\$divname\">/,/<\/div>/ begins the search, matching against the line starting with <div id="$divname"> until it reaches </div>, the closing tag.
c\ is the command to change text. This will delete the selected range of text.
<div id=\"\$divname\">\n$newcontent\n</div> is our new string of text that will be inserted, replacing the deleted text. In this case, it's a new <div id="$divname"> tag with the $newcontent variable containing the timestamp.
Note that \n is used for better formatting to enter a new line before closing the div.

You may want your end result to be indented. \t can be used to enter a tab. This can be used multiple times to indent further.
For example, to indent the div in line with my website's formatting, I use the following:

sed -i "/<div id=\"$divname\">/,/<\/div>/c\\\t\t<div id=\"$divname\">\n\t\t\t$newcontent\n\t\t</div>" "$filename"

Once you've saved the script, make it executable with chmod +x ./scriptname.sh
and run it with ./scriptname.sh /path/to/file.html

The full script:

#!/bin/bash

### Script for timestamping HTML pages.
### kinisis.xyz

# div id of the "last update" element
divname="lastupdate"

## Get specified page from user input
# Check if a filename was provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <filename>"
    exit 1
fi

# filename user input
filename=$1

# Ensure the file exists
if [ ! -f "$filename" ]; then
    echo "File not found: $filename"
    exit 1
fi

## Generate a timestamp
# Get the current date and time
currentDate=$(date '+%Y-%m-%d %H:%M')
# Retrieve the local timezone
localZone=$(date '+%Z')
# Determine if the current date is in DST - grep counts whether DST is in the output
isDaylightSaving=$(date '+%Z' | grep -c DST)
# Get the current UTC offset in hours and minutes
utcOffset=$(date '+%:z')

# Initialize the variable to hold the formatted offset
utcOffsetFormatted=""
# Check if the offset is zero (UTC)
if [ "$utcOffset" == "+00:00" ]; then
    utcOffsetFormatted="UTC+0"
else
    # Format the offset with "UTC +/- Offset"
    utcOffsetFormatted="UTC$utcOffset"
fi

# Store the full timestamp in a variable
timestamp="$currentDate $utcOffsetFormatted"

# Define the variable with the content to be inserted
newcontent="Last Update: $timestamp"

## Apply timestamp to specified page
# Use sed to replace the content of div
sed -i "/<div id=\"$divname\">/,/<\/div>/c\<div id=\"$divname\">\n$newcontent\n</div>" "$filename"