Tag Archives: data visualization

Is Reddit the new Erowid?

Erowid.org is described as “a member-supported organization providing access to reliable, non-judgmental information about psychoactive plants, chemicals, and related issues.” The site was launched in 1995 and by 2005 had 450,000 page hits and 45,000 visitors per day. Website use statistics haven’t been reported since then, though its likely that there has been some attrition due to the growing diversity of online resources for drug information.  Out of curiosity, I ran a quick Google Trends search to compare where Internet users are searching for information about two well-known synthetic drugs, LSD (acid) and MDMA (ecstasy).  The following chart depicts searches for these two keywords that were accompanied by either “erowid” or “reddit”, as well as a search for “erowid reddit”.

Monthly search data were obtained from January 2004 through the present day and trendlines were smoothed using a logarithmic ^4 transformation. What we see is that sometime in 2015, searches for information about these drugs related to Reddit appeared to overtake searches related to Erowid.  Implications of this trend go beyond simple navel-gazing about Internet search preferences.  For example, if individuals are now more likely to get information about psychoactive drugs from Reddit, it begs questions about the quality of information that they are getting (particularly information about risks and interactions which is prominent and well-curated on the Erowid site).


FitBit sleep tracking

I’ve been on the fence about replacing my broken FitBit for a couple years. What finally pushed me off that fence was the recent unlocking of intraday data for developers’ personal projects. This makes it possible for me to access my minute-by-minute activity and sleep data. What follows is a quick rundown of my initial foray into FitBit sleep tracking. Continue reading

Recently stopped drinking? Check in to Reddit.

Sat check-in

Figure 1: Network graph

This figure represents the 100 most recent threads on an alcohol support subreddit captured on the morning of Saturday 1/16/2016. The data account for 1,021 posts or comments made by 337 valid users (55 were excluded due to not having a stop-drinking date). Each node in the graph represents a post or comment. The red nodes correspond to Saturday check-in responses, where individuals pledge to not drink today. Node size corresponds to number of days since last drink (μ=382, SD=1340, Med.=30, IQR: 12-214). Continue reading

Social Network Analysis Graphs

Over winter break, I’ve been learning to mine Reddit for data that might be used to conduct social science research.  Specifically, I’m interested in the content and structure of “subreddit” communities that focus on mental and behavioral health conditions. For example, what might we be able to learn from examining the social structure and level of engagement in different types of emotional support threads?

So far, I’ve gotten the hang of retrieving data from the Python Reddit API Wrapper (PRAW). I’m now testing a few different visualization tools for social network analysis. Below are quick overviews of three options that I’ve tested so far (NodeXL, NetworkX, and Gephi). These data come from a subreddit that focuses on a particular mental health condition. The central nodes (hubs) are the subreddit moderators, and the peripheral nodes (spokes) are the last ~50 people who they responded to.  What we can see from these visualizations is that the moderators sometimes communicate with each other publicly, but their communications with non-moderators don’t tend to overlap. Continue reading

Relative per-capita endorsement of the 2012 Open Access petition in the U.S.

This is an interactive chart – hover over states to display their scores! If the chart doesn’t display, the static image is here.

Data include individuals who self-reported a location in one of the 50 U.S. states for the first 25,000 endorsements (n = 15,695) on the official White House petition site. Per-capita figures were calculated using 2011 Population Estimates, standardized, and scaled to a mean of 50 and standard deviation of 10 for comparison purposes. The chart was created in Google Charts and the raw data are available here. Continue reading