About & Methods

Data sources, methodology, technical notes, and important caveats for interpreting these visualizations.

Project Overview

This project maps the field of Japanese literary studies through network analysis, content mining, and interactive visualization. By combining data from dissertations, conference proceedings, journal articles, citations, translations, and encyclopedic sources, we construct a multi-dimensional portrait of how this academic field has evolved from 1915 to the present.

The visualizations presented here represent over 7,900 dissertations, 13,500 conference presentations, 81,000 citation relationships, 36,000 translations, and 17,000 literary works. This is not a complete census—scholarly activity extends beyond what can be captured in databases—but it represents the most comprehensive computational analysis of the field to date.

Data Sources

Scholarly Networks & Dissertations

ProQuest Dissertations & Theses: 7,922 dissertations (1915-2025) identified through keyword searches
Advisor-advisee relationships: Extracted from dissertation acknowledgments and institutional records
JSTOR: 57,990 citation edges from journal articles
OpenAlex: 23,213 citation edges (culled from 950,000+ raw citations using content filters)
Book review data: Manual extraction from major journals

Conference Data

AJLS (Association for Japanese Literary Studies): 1,116 presentations (1998-2025) from programs and proceedings
AAS (Association for Asian Studies): 2,208 presentations (2013-2025) via Confex API + PDF parsing
MLA (Modern Language Association): 1,462 presentations (1968-2026) from PMLA archives and Confex API
ACLA (American Comparative Literature Association): 861 presentations (1992-2026) from PDF programs
SCMS (Society for Cinema and Media Studies): 2,803 presentations (2008-2026) from PDF programs

Job Market Data

MLA Job Information List: 4,280 job listings (1966-2025) from MLA JIL archives
Japan Foundation grants and fellowship data

Methodology

Network Construction

The scholarly network comprises three edge types:

Advisor edges: PhD advisor → advisee relationships create genealogical lineages spanning six generations (Founders through Emerging scholars)
Citation edges: Author → cited author relationships from JSTOR and OpenAlex, filtered for field relevance
Review edges: Reviewer → reviewed author relationships from book reviews

Network metrics include degree centrality, betweenness centrality, PageRank, and generational influence scores. Authority rankings combine multiple metrics weighted by data quality.

Content Analysis

Dissertation and conference titles were analyzed using:

Term frequency-inverse document frequency (TF-IDF): Identifies distinctive vocabulary by institution, time period, or venue
Topic modeling: Latent Dirichlet Allocation (LDA) with 15-50 topics depending on corpus size
Named entity recognition: Extracts author names, literary works, theoretical frameworks, and geographic references
Temporal analysis: Tracks term frequency changes over time to identify emerging trends

Canonicity Framework

The three-axis canonicity model measures authors on three independent dimensions:

Japanese canonicity: Encyclopedia biography length + zenshū publication count + cross-references
Academic reception: Dissertation mentions + conference presentations weighted by venue prestige
Translation reception: Translation count + number of target languages

This framework reveals misalignments: authors canonical in Japan but unstudied in Western academia, global bestsellers (Murakami) not yet established in Japanese canon, and Nobel laureates who fall below domestic canonicity thresholds.

Entity Resolution

Person name matching across heterogeneous data sources used:

Exact matches on normalized names (diacritics stripped, case-insensitive)
Fuzzy matching on last name + first initial (manually reviewed for common surnames)
Institutional affiliation as secondary evidence
Manual curation of ambiguous cases (5-10% of corpus)

Important Caveats

Selection Bias

These data privilege English-language scholarship and North American institutions. Japanese-language scholarship, non-PhD pathways, and scholars based in Japan or other non-Western countries are underrepresented. Conference data covers only five venues; many regional conferences and workshops are omitted.

Temporal Coverage Gaps

Pre-1970 data is sparse. Early career scholars (PhD 2020+) have limited publication records. Some conference years are missing due to unavailable programs or cancelled events (e.g., AAS 2020-2021 pandemic cancellations).

Citation Data Quality

OpenAlex citations were aggressively filtered (from 950,000 to 23,000 edges) to remove false positives from name collisions (common in East Asian names). This prioritizes precision over recall—genuine citations may have been excluded. JSTOR coverage favors older journals and omits recent open-access publications.

Name Ambiguity

Approximately 5-10% of person records involve ambiguous name matches. Short surnames (Li, Kim, Wang), variant romanizations (Ōe vs Oe), and married name changes complicate entity resolution. Some scholars may be split across multiple records or incorrectly merged.

Canonicity Model Limitations

The three-axis framework reflects measurable data (encyclopedia entries, translations, dissertations) rather than literary quality or cultural significance. It privileges certain forms of recognition (encyclopedia treatment, academic study) over others (popular readership, pedagogical influence). Newly published authors lack time to accumulate metrics.

Encyclopedia Scope

The Shinchō Encyclopedia (our primary source) focuses on Japanese-language creative writers. Critics, translators, and non-Japanese authors of Japanese literature are largely excluded. The encyclopedia's editorial choices reflect mid-20th century Japanese literary establishment values.

Technical Implementation

Visualizations use D3.js v7 for interactive graphics, with data processing in Python 3.13 (pandas, numpy, networkx, sqlite3). The encyclopedia project used Claude AI for NLP-based thematic classification. All code and raw data are available on request.

The color palette uses traditional Japanese color names adapted for accessibility in light and dark modes. Typography: Cormorant Garamond (headings), Source Sans 3 (body).

Attribution & Contact

This project was created by Jonathan E. Abel (Pennsylvania State University) as part of an ongoing investigation into the structure and evolution of Japanese literary studies.

Data sources are acknowledged throughout the visualizations. If you identify errors or have additional data to contribute, please reach out via the institutional website.

Citation: Abel, Jonathan E. (2026). Japanese Literary Studies: Interactive Visualizations. Digital Humanities Project. https://[URL]

Acknowledgments

Thanks to the institutions that make their data publicly available: ProQuest, JSTOR, OpenAlex, Wikidata, Japan Foundation, MLA, AAS, ACLA, SCMS, and AJLS. Thanks to colleagues who provided feedback on early versions of these visualizations.