I write in a journal every day and have been since the summer of 2020. I write the journal in Notion (which is also the foundation of this website) and I decided to export the entries and process the data to see what insights I could get from my habitual documentation of my daily life. This is what the journal database in Notion looks like.

Correlational analysis plot - see below

I write the entry number, the date, the hours of sleep I got the night before, I rate the day on a five-star scale, and I check that I have done some form of exercise, engaged with one of my musical instruments, and did some reading. And then on each page, I write several paragraphs talking about my day, my dreams from the night before and list some things I am thankful for.

First, I exported the database from Notion and that export got me a CSV file that contained all of the properties for all of the ~~500+~~ 1000+ entries.

I also wanted a word count property to get an idea of how much I wrote each night so I wrote a Python program to get this. It iterates through each of the entries counts the words and then appends the word count in another column to the existing CSV file. I made some graphs in Excel with this - shown below.

Next, I wanted to see how the frequency of certain words I use changes over time. I added to the Python program so it takes a list of words and then counts the frequency of each of those words over all of the entries and graphs it using the `matplotlib`

library. I got some really cool results with this. โคต๏ธ

This is a graph of "Ethan" and "Cardi" over time. One of my friends at Tufts is named Ethan Cardi but I had another friend named Ethan in my dorm so partway through the year we started calling him by his last name, Cardi. You can see the approximate point where this happened from my journal entries.

Also, I tore my meniscus rock climbing near the end of my freshman year and needed knee surgery to repair it. The graph really tells a story. You can see how I got into rock climbing, when the traumatic knee injury happened, when I found out I would need surgery and then got it, and then how I had a long conversation with my dentist about rock climbing the week after.

My next goal was to see how different words in my journal correlated with one another. For instance, if I say โxโ a certain amount of times in an entry, how likely am I to say โyโ. To do this I first used the lists I got from the graphs of word usage over time. For each expression, I had a list where each value in the list corresponded to the number of times I used that word in that journal entry. To find the correlation values I played around with both of the formulas below and preferred the output of the left one (which I came up with intuitively), even though it does not give the exact correlation coefficient of the two variables.

$$ \text{correlation}=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum x+\sum y} $$

$$ \text{correlation}=\dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2+\sum(y_i-\bar{y})^2}} $$

$\bar{x}$ and $\bar{y}$ are the average values of $x$ and $y$ respectively. $x_i$ and $y_i$ are the number of mentions in a specific entry.

Below is a scatter plot that shows the correlation between words that I entered into the program for it to analyze. The larger the dot at a point, the higher the correlation between the words on the x and y axis at that point. Normally you would see the axis, but I have covered them here.

Here is an excel graph I created showing how many words I write in a journal entry and the average hours of sleep I get based on the star rating I give it. The graph shows that I write more and more the better the day is, except if I've had a very bad day then I vent a lot about that.

This graph shows how certain variables differ based on the day of the week. The sleep I get and the rating I give is surprisingly uniform across the week, but I tend to write more on Friday and the least on Monday.

Collecting and then graphing the data about my journal entries over the past year has been extremely fun and I think it is so interesting to see the patterns that emerge from documenting each day.