Web applications
Project Gutenberg Corpus Builder
Search the Project Gutenberg database and download ebooks in various formats.
How to cite:
Moncomble, Florent. 2025. ‘Project Gutenberg Corpus Builder’. Web application. 2025. https://corpustools.prendrelangue.fr/pgcorpusbuilder/. Download BibTeX file
The Times Corpus Builder
Search The Times and download articles in various formats.
How to cite:
Moncomble, Florent. 2025. ‘The Times Corpus Builder’. Web application. 2025. https://corpustools.prendrelangue.fr/timescorpusbuilder/. Download BibTeX file
The Guardian Corpus Builder
Search The Guardian and download articles in various formats. Also available as part of the Press Corpus Scraper browser extension.
How to cite:
Moncomble, Florent. 2025. ‘The Guardian Corpus Builder’. Web application. 2025. https://corpustools.prendrelangue.fr/guardiancorpusbuilder/. Download BibTeX file
The New York Times Corpus Builder
Search The New York Times and download articles in various formats. Also available as part of the Press Corpus Scraper browser extension.
How to cite:
Moncomble, Florent. 2025. ‘The NYT Corpus Builder’. Web application. 2025. https://corpustools.prendrelangue.fr/nytcorpusbuilder/. Download BibTeX file
The NPR Corpus Builder
Search NPR and download articles in various formats.
How to cite:
Moncomble, Florent. 2025. ‘The NPR Corpus Builder’. Web application. 2025. https://corpustools.prendrelangue.fr/nytcorpusbuilder/. Download BibTeX file
Corriere della Sera Corpus Builder
Search the Corriere della Sera and download articles in various formats.
How to cite:
Moncomble, Florent. 2025. ‘The Corriere Della Sera Corpus Builder’. Web application. 2025. https://corpustools.prendrelangue.fr/cdscorpusbuilder/. Download BibTeX file
BlueskyScraper
Scrape and download posts from the Bluesky social network. Also available as a browser extension.
How to cite:
Moncomble, Florent. 2025. ‘BlueskyScraper’. Web application. 2025. https://corpustools.prendrelangue.fr/blueskyscraper/. Download BibTeX file
BlueskyStreamer
Stream Bluesky posts in real time and download in
various formats.
Also available as part of the
BlueskyScraper browser extension.
How to cite:
Moncomble, Florent. 2025. ‘BlueskyStreamer’. Web application. 2025. https://corpustools.prendrelangue.fr/blueskystreamer/. Download BibTeX file
MastoScraper
Scrape and download posts from the Mastodon social network. Also available as a browser extension.
How to cite:
Moncomble, Florent. 2025. ‘MastoScraper’. Web application. 2025. https://corpustools.prendrelangue.fr/mastoscraper/. Download BibTeX file
Type/token ratio
Calculate and compare the type/token ratio of different corpora as an estimate of their lexical diversity.
How to cite:
Moncomble, Florent. 2025. ‘Type/Token Ratio’. Web application. 2025. https://corpustools.prendrelangue.fr/ttr/. Download BibTeX file
Browser extensions
Some of the tools above and others exist as browser add-ons:
APP Extractor
A browser extension to scrape and download documents from The American Presidency Project.
How to cite:
Moncomble, Florent. (2023) 2024. ‘APP_Extractor’. JavaScript. Arras, France: Université d’Artois. https://github.com/fmoncomble/APP_extractor. Download BibTeX file
BlueskyScraper (browser add-on)
A browser extension to scrape or stream and download Bluesky posts.
How to cite:
Moncomble, Florent. (2024) 2025. ‘BlueskyScraper’. JavaScript. Arras, France: Université d’Artois. https://github.com/fmoncomble/blueskyscraper. Download BibTeX file
DiscordScraper
A browser extension to scrape and download Discord messages.
How to cite:
Moncomble, Florent. (2025) 2025. ‘DiscordScraper’. JavaScript. Arras, France: Université d’Artois. https://github.com/fmoncomble/DiscordScraper. Download BibTeX file
MastoScraper (browser add-on)
A browser extension to scrape and download toots (posts on Mastodon).
How to cite:
Moncomble, Florent. (2024) 2024. ‘MastoScraper’. JavaScript. Arras, France: Université d’Artois. https://github.com/fmoncomble/mastoscraper. Download BibTeX file
Press Corpus Scraper
A browser extension to extract and download press articles from a variety of sources.
How to cite:
Moncomble, Florent. 2024. ‘Press Corpus Scraper’. JavaScript. Arras, France: Université d’Artois. https://fmoncomble.github.io/press-corpus-scraper/. Download BibTeX file
RedditScraper
A browser extension to scrape and download Reddit posts.
How to cite:
Moncomble, Florent. (2024) 2024. ‘RedditScraper’. JavaScript. Arras, France: Université d’Artois. https://github.com/fmoncomble/redditscraper. Download BibTeX file
Social Corpus Scraper
A 4-in-1 bundle of BlueskyScraper, MastoScraper, RedditScraper and 𝕏-Scraper.
How to cite:
Moncomble, Florent. (2024) 2025. ‘Social Corpus Scraper’. JavaScript. https://github.com/fmoncomble/SocialCorpusScraper. Download BibTeX file
𝕏-Scraper
A browser extension to scrape and download tweets.
How to cite:
Moncomble, Florent. (2024) 2025. ‘X-Scraper’. JavaScript. Arras, France: Université d’Artois. https://github.com/fmoncomble/X-scraper. Download BibTeX file