The german Wikipedia is slowly stagnating

No. of daily edits


"Edit" shows the number of daily edits, "smooth" shows data that has been filtered by a low-pass filter

No. of removed articles in comparison to number of new* articles


"New" shows the number of new* articles, "Delete" the number of deleted articles, "smooth" shows data that has been filtered by a low-pass filter, "raw" of course shows the raw data

The number of new articles per day has been declining since about 2007. In October 2009 the number of articles deleted per day reaches the number of new articles surviving (per day).

*The study has a methodological problem: only non-removed articles are visible in the German Wikipedia's dump, so articles that have been deletedalready will never show up as "new" articles.

Here you will find the raw data:

(please take one of these if you are only interested in the numbers!)

(Please only download these if you really intend to work with the data!)

Pages from following namespaces where ignored when counting the number of new and deleted pages:

Benutzer: Benutzer Diskussion: Datei: Datei Diskussion: Diskussion: Hilfe: Hilfe Diskussion: Kategorie: Kategorie Diskussion: MediaWiki: MediaWiki Diskussion: Portal: Portal Diskussion: Spezial: Vorlage: Vorlage Diskussion: Wikipedia: Wikipedia Diskussion:

Scripts that generate the data:

gzip -c -d dewiki-20091028-pages-logging.xml.gz | awk -v OFS=" " -F "[<>]" '/<timestamp>/{ts=$3} /<contributor>/{id=uid;ip=0;un=0;uid=0} /<ip>/{ip=$3} /<username>/{un=$3} /<id>/{uid=$3} /<\/contributor>/ {if (ip) {con="ip:"ip;} else {con="user:"uid":"un;}} /<action>/{act=$3} /<logtitle>/{lt=$3} /<\/logitem>/{print id,ts,act,con,lt}' >dewiki-logging.csv &

gzip -c -d dewiki-20091028-stub-meta-history.xml.gz | awk -v OFS=" " -F "[<>]" '/<timestamp>/{ts=$3} /<contributor>/{id=uid;ip=0;un=0;uid=0} /<ip>/{ip=$3} /<username>/{un=$3} /<id>/{uid=$3} /<\/contributor>/ { if (ip) {con="ip:"ip;} else {con="user:"uid":"un;} } /<action>/{act=$3} /<title>/{lt=$3} /<\/revision>/{print id,ts,con,lt}'>dewiki-edits.csv &

wait

grep -v -F -f Exclude dewiki-logging.csv|grep " delete "|cut -d " " -f 2 |cut -d T -f 1 |sort |uniq -c|awk -v OFS=" " '{print $2,$1}'>deletes.csv &

grep -v -F -f Exclude dewiki-edits.csv|cut -d " " -f 2,4 |uniq -f 1|cut -d T -f 1|sort |uniq -c|awk -v OFS=" " '{print $2,$1}'>new.csv &

grep -v -F -f Exclude dewiki-edits.csv|cut -d " " -f 2|cut -d T -f 1|sort |uniq -c|awk -v OFS=" " '{print $2,$1}'>edits.csv &

wait

join edits.csv new.csv|join -a 1 - deletes.csv >all.csv

PS: The German Wikipedia itself shows following numbers:


Done by Maximilian Dörrbecker

Any comments or suggestions can be send to christian [at] mogis-verein.de (Twitter @ChristianBahls)