Published on March 18, 2016 by Dr. Randal S. Olson
3 min READ
I've been a moderator of /r/DataIsBeautiful -- one of the largest online communities dedicated to data analysis and visualization -- for the past 2 1/2 years. During that time, I've reviewed thousands of data visualizations created by amateurs and professionals alike.
(For those not in the know: Moderators on Reddit volunteer their time to help run the subreddits, remove spam, enforce posting rules, and various other tasks to keep the subreddits on-topic and spam-free.)
For this post, I thought it would be a fun exercise to visualize how /r/DataIsBeautiful's posting rules have evolved over time. After all, it only seems appropriate that a /r/DataIsBeautiful moderator would analyze and visualize their own community, right?
/r/DataIsBeautiful currently has five core posting rules (1-5), and three experimental rules (6-8):
These posting rules try to provide objective criteria for what makes an appropriate /r/DataIsBeautiful post, and generally make sure that:
I've always been curious about the relative importance of these rules over time, so I analyzed the Reddit comment cache on BigQuery, parsed out the official moderator comments that /r/DataIsBeautiful moderators made when removing posts, and binned them by month.
I was able to analyze the comments between January 2013 and January 2016, which provides a unique perspective on the subreddit before and after it became a default subreddit in early 2014.
By far, the two biggest reasons for posts being removed on Reddit are:
Today, those two reasons constitute roughly 50% and 40% of all post removals, respectively.
Interestingly, ever since it defaulted /r/DataIsBeautiful has been receiving increasingly more posts that don't even include data visualizations. I believe this trend highlights the importance of an active moderation team as a community grows: As /r/DataIsBeautiful's subscriber numbers climbed from the hundreds of thousands into the millions, the moderators were there to help the community stay on track and share relevant content.
You'll also notice that there was a stint in 2014 where the moderation team required that all posts linking directly to images must link to PNGs (due to text quality issues with JPEGs), but that rule was replaced when we enacted the rule requiring that all posts link directly to the original source. Since only Original Content creators can post direct links to images now, we decided that it was best to allow them to decide how they wanted to share their content.
You may also notice the lack of the "no political posts except for Thursdays" rule, which was introduced in February 2016. Since February 2016's comment data is not yet available, I'll have to leave that analysis for a future post.
For completeness, I've also included the raw counts for each post removal reason below.
You'll likely notice the spike in "original source" removals in late 2014, which was due to the /r/DataIsBeautiful mod team more strictly enforcing directly links to the original source. The community took a couple months to get used to the rule, but eventually returned to normal.
For the most part, posting rules 3-8 deal with edge cases: Posters confusing infographics with data visualizations, sensationalized post titles, or reposts of links that were already shared recently. While theserules are nonetheless important, this analysis has shown us that only two of the eight posting rules play the most important role in keeping /r/DataIsBeautiful on track.
We'll be sure to continually review and revise these posting rules as the /r/DataIsBeautiful community grows. If you have ideas on how we can improve the community -- through posting rules or otherwise -- please reach out to us by modmail.