Florent Moncomble’s corpus tools

I am a senior lecturer in English linguistics at the Université d’Artois in Arras, France. This page lists a number of apps I have been developing over the last few years for corpus collection and analysis. (Contact)

Use of these apps is subject to applicable copyright and privacy laws as well as fundamental ethical principles.

Please remember to cite the tools you use in your publications and presentations.

Click an item below to scroll to the relevant section:

Web applications

Project Gutenberg Corpus Builder

Search the Project Gutenberg database and download ebooks in various formats.

How to cite:

Moncomble. F. (2025). Project Gutenberg Corpus Builder. Arras, France: Université d’Artois.
https://corpustools.prendrelangue.fr/pgcorpusbuilder/

The Times Corpus Builder

Search The Times and download articles in various formats.

How to cite:

Moncomble. F. (2025). The Times Corpus Builder. Arras, France: Université d'Artois.
https://corpustools.prendrelangue.fr/timescorpusbuilder/

The Guardian Corpus Builder

Search The Guardian and download articles in various formats. Also available as part of the Press Corpus Scraper browser extension.

How to cite:

Moncomble. F. (2025). The Guardian Corpus Builder. Arras, France: Université d'Artois.
https://corpustools.prendrelangue.fr/guardiancorpusbuilder/

BlueskyScraper

Scrape and download posts from the Bluesky social network. Also available as a browser extension.

How to cite:

Moncomble. F. (2025). BlueskyScraper. Arras, France: Université d'Artois.
https://corpustools.prendrelangue.fr/blueskyscraper/

BlueskyStreamer

Stream Bluesky posts in real time and download in various formats.
Also available as part of the BlueskyScraper browser extension.

How to cite:

Moncomble. F. (2025). BlueskyStreamer. Arras, France: Université d'Artois.
https://corpustools.prendrelangue.fr/blueskystreamer/

MastoScraper

Scrape and download posts from the Mastodon social network. Also available as a browser extension.

How to cite:

Moncomble. F. (2025). MastoScraper. Arras, France: Université d'Artois.
https://corpustools.prendrelangue.fr/mastoscraper/

Type/token ratio

Calculate and compare the type/token ratio of different corpora as an estimate of their lexical diversity.

How to cite:

Moncomble. F. (2025). Type/token ratio calculator. Arras, France: Université d'Artois.
https://corpustools.prendrelangue.fr/ttr/

Browser extensions

Some of the tools above and others exist as browser add-ons:

APP Extractor

A browser extension to scrape and download documents from The American Presidency Project.

How to cite:

Moncomble, F. (2024). APP_Extractor (Version 0.7.2) [JavaScript]. Arras, France: Université d’Artois. Available from: https://fmoncomble.github.io/APP_extractor/ (First version 2023)

BlueskyScraper (browser add-on)

A browser extension to scrape or stream and download Bluesky posts.

How to cite:

Moncomble, F. (2024). BlueskyScraper (Version 0.4) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/blueskyscraper/

DiscordScraper

A browser extension to scrape and download Discord messages.

How to cite:

Moncomble, F. (2024). DiscordScraper (Version 0.1) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/DiscordScraper/

MastoScraper (browser add-on)

A browser extension to scrape and download toots (posts on Mastodon).

How to cite:

Moncomble, F. (2024). MastoScraper> (Version 0.8) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/mastoscraper/

Press Corpus Scraper

A browser extension to extract and download press articles from a variety of sources.

How to cite:

Moncomble, F. (2025). Press Corpus Scraper (Version 0.12) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/press-corpus-scraper/

RedditScraper

A browser extension to scrape and download Reddit posts.

How to cite:

Moncomble, F. (2024). RedditScraper (Version 0.4) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/redditscraper/

Social Corpus Scraper

A 4-in-1 bundle of BlueskyScraper, MastoScraper, RedditScraper and 𝕏-Scraper.

How to cite:

Moncomble, F. (2025). SocialCorpusScraper (Version 0.6) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/SocialCorpusScraper/

𝕏-Scraper

A browser extension to scrape and download tweets.

How to cite:

Moncomble, F. (2025). 𝕏-Scraper (Version 0.5) [JavaScript]. Arras, France: Université d’Artois. Available at: https://fmoncomble.github.io/X-scraper/