Web applications
Project Gutenberg Corpus Builder
Search the Project Gutenberg database and download ebooks in various formats.
How to cite:
Moncomble. F. (2025).
Project Gutenberg Corpus Builder.
Arras, France: Université d’Artois.
https://corpustools.prendrelangue.fr/pgcorpusbuilder/
The Times Corpus Builder
Search The Times and download articles in various formats.
How to cite:
Moncomble. F. (2025).
The Times Corpus Builder. Arras,
France: Université d'Artois.
https://corpustools.prendrelangue.fr/timescorpusbuilder/
The Guardian Corpus Builder
Search The Guardian and download articles in various formats. Also available as part of the Press Corpus Scraper browser extension.
How to cite:
Moncomble. F. (2025).
The Guardian Corpus Builder. Arras,
France: Université d'Artois.
https://corpustools.prendrelangue.fr/guardiancorpusbuilder/
BlueskyScraper
Scrape and download posts from the Bluesky social network. Also available as a browser extension.
How to cite:
Moncomble. F. (2025).
BlueskyScraper. Arras, France:
Université d'Artois.
https://corpustools.prendrelangue.fr/blueskyscraper/
BlueskyStreamer
Stream Bluesky posts in real time and download in
various formats.
Also available as part of the
BlueskyScraper browser extension.
How to cite:
Moncomble. F. (2025).
BlueskyStreamer. Arras, France:
Université d'Artois.
https://corpustools.prendrelangue.fr/blueskystreamer/
MastoScraper
Scrape and download posts from the Mastodon social network. Also available as a browser extension.
How to cite:
Moncomble. F. (2025).
MastoScraper. Arras, France: Université
d'Artois.
https://corpustools.prendrelangue.fr/mastoscraper/
Type/token ratio
Calculate and compare the type/token ratio of different corpora as an estimate of their lexical diversity.
How to cite:
Moncomble. F. (2025).
Type/token ratio calculator. Arras,
France: Université d'Artois.
https://corpustools.prendrelangue.fr/ttr/
Browser extensions
Some of the tools above and others exist as browser add-ons:
APP Extractor
A browser extension to scrape and download documents from The American Presidency Project.
How to cite:
Moncomble, F. (2024). APP_Extractor (Version 0.7.2) [JavaScript]. Arras, France: Université d’Artois. Available from: https://fmoncomble.github.io/APP_extractor/ (First version 2023)
BlueskyScraper (browser add-on)
A browser extension to scrape or stream and download Bluesky posts.
How to cite:
Moncomble, F. (2024). BlueskyScraper (Version 0.4) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/blueskyscraper/
DiscordScraper
A browser extension to scrape and download Discord messages.
How to cite:
Moncomble, F. (2024). DiscordScraper (Version 0.1) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/DiscordScraper/
MastoScraper (browser add-on)
A browser extension to scrape and download toots (posts on Mastodon).
How to cite:
Moncomble, F. (2024). MastoScraper> (Version 0.8) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/mastoscraper/
Press Corpus Scraper
A browser extension to extract and download press articles from a variety of sources.
How to cite:
Moncomble, F. (2025). Press Corpus Scraper (Version 0.12) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/press-corpus-scraper/
RedditScraper
A browser extension to scrape and download Reddit posts.
How to cite:
Moncomble, F. (2024). RedditScraper (Version 0.4) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/redditscraper/
Social Corpus Scraper
A 4-in-1 bundle of BlueskyScraper, MastoScraper, RedditScraper and 𝕏-Scraper.
How to cite:
Moncomble, F. (2025). SocialCorpusScraper (Version 0.6) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/SocialCorpusScraper/
𝕏-Scraper
A browser extension to scrape and download tweets.
How to cite:
Moncomble, F. (2025). 𝕏-Scraper (Version 0.5) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/X-scraper/