Recently stopped drinking? Check in to Reddit.

Sat check-in

Figure 1: Network graph

This figure represents the 100 most recent threads on an alcohol support subreddit captured on the morning of Saturday 1/16/2016. The data account for 1,021 posts or comments made by 337 valid users (55 were excluded due to not having a stop-drinking date). Each node in the graph represents a post or comment. The red nodes correspond to Saturday check-in responses, where individuals pledge to not drink today. Node size corresponds to number of days since last drink (μ=382, SD=1340, Med.=30, IQR: 12-214). Of the 337 users, 155 (46%) responded to the check-in thread. Roughly 60% of these individuals were within the first 30 days of not drinking (Figure 2). Of those within their first 30 days, nearly 20% had exactly 15 days not drinking (stopped drinking on January 1st; Figure 3). Note that no one had only one day, which may be due to the delay of manually assigning new badges (where non-drinking days are counted).


Figure 2: Distribution of days for check-in response


Figure 3: Distribution w/in 30 days for check-in response (almost 20% from New Year’s Day)

This begs the question, what about the 182 users who posted but didn’t respond to the check-in thread… do they have significantly more days of not drinking? To test my hypothesis (that users who don’t check in have, on average, more non-drinking days), I used a Mann-Whitney test. As predicted, the two groups were significantly different with respect to non-drinking days, Mann-Whitney z=3.95, p<.001. Specifically, the group that did not respond to the thread had a median average of 82.5 non-drinking days, and ranked significantly higher than those who did respond to the thread (Med.=17). In short, those who checked in on this morning’s “not drink today” thread had significantly fewer days of not drinking than recent posters who didn’t check in.

Future research may build on these preliminary findings by expanding the sampling frame, including additional characteristics of Reddit users (e.g., activity, karma), and extending the research to Reddit’s other health support networks (e.g., /r/stopsmoking).


Why did you do this?

In solidarity with people who are suffering from the effects of alcohol, and for a healthy change of pace after the holidays, I began the new year with Dry January. I’ve been following along with /r/stopdrinking on Reddit. It is inspiring to see the amount of determination and support emanating from this online community (see also: Washington Post).

I did this largely out of curiosity, since I have a great interest in both social network analysis and alcohol use. However, I also hope that this work is interesting and perhaps inspiring for others. If you are interested in seeking personal insight or support for your own alcohol use, a brief check-in at /r/stopdrinking may be something to consider. Additional research will be useful to determine the clinical efficacy of online support groups such as this.


What about privacy?

All data were publicly available via the Reddit API, which is bound by the Reddit user agreement. Out of respect for /r/stopdrinking community members, I did not retain any of the raw data used for this analysis and made no effort to identify individual users.


How were data retrieved?

Raw data were accessed via the Reddit API, using the Python PRAW library. I accessed the 100 most-recent posts on /r/stopdrinking and all associated comments (up to 200 per thread). I wrote a script to parse the number of non-drinking days from user badges. Data were output to CSV files for analysis.


How was the network graph constructed?

I imported two CSV files (one for nodes and one for edges), into Gephi software. Edges retained the tree structure of individual threads. I collapsed the graph and then ran weighted transformations to map the nodes. This gave me a better visual sense of the network and guided my hypothesis about the check-in group.


Why not use a t-test to check the hypothesis?

I did not use a t-test because the overall distribution of non-drinking days was positively skewed, Shapiro-Wilk W=0.27, p<.01. Also, variances of the two groups (check-in vs. no check-in) were unequal, Brown-Forsythe F(1,335)=9.18, p<.01. In this case, the nonparametric Mann-Whitney test was more appropriate to test group differences.


Other questions? Please ask!

One thought on “Recently stopped drinking? Check in to Reddit.

  1. Post author

    Update: I also assessed the length of posts in the check-in thread, as compared to those in the rest of the data. Length was measured in characters, including spaces. The check-in posts were significantly shorter (Med. = 47, IQR: 24-87, n = 171), as compared to their counterparts (Med. = 119, IQR: 42-274, n = 947), Mann-Whitney z = 8.36, p < .001. That's not a surprising finding, since check-ins aren't intended to be lengthy. This result nevertheless worth mentioning, as it objectively demonstrates the relative brevity of this thread. The /r/stopdrinking check-in system may serve as a valuable approach to lower the barriers of community participation for newly-stopped drinkers.


Leave a Reply

Your email address will not be published. Required fields are marked *