About & Methods

Data sources, methodology, technical notes, and important caveats for interpreting these visualizations.

Project Overview

This project maps the field of Japanese literary studies through network analysis, content mining, and interactive visualization. By combining data from dissertations, conference proceedings, journal articles, citations, translations, and encyclopedic sources, we construct a multi-dimensional portrait of how this academic field has evolved from 1915 to the present.

The visualizations presented here represent over 7,900 dissertations, 13,500 conference presentations, 81,000 citation relationships, 36,000 translations, and 17,000 literary works. This is not a complete census—scholarly activity extends beyond what can be captured in databases—but it represents the most comprehensive computational analysis of the field to date.

Data Sources

Scholarly Networks & Dissertations

  • ProQuest Dissertations & Theses: 7,922 dissertations (1915-2025) identified through keyword searches
  • Advisor-advisee relationships: Extracted from dissertation acknowledgments and institutional records
  • JSTOR: 57,990 citation edges from journal articles
  • OpenAlex: 23,213 citation edges (culled from 950,000+ raw citations using content filters)
  • Book review data: Manual extraction from major journals

Conference Data

  • AJLS (Association for Japanese Literary Studies): 1,116 presentations (1998-2025) from programs and proceedings
  • AAS (Association for Asian Studies): 2,208 presentations (2013-2025) via Confex API + PDF parsing
  • MLA (Modern Language Association): 1,462 presentations (1968-2026) from PMLA archives and Confex API
  • ACLA (American Comparative Literature Association): 861 presentations (1992-2026) from PDF programs
  • SCMS (Society for Cinema and Media Studies): 2,803 presentations (2008-2026) from PDF programs

Job Market Data

  • MLA Job Information List: 4,280 job listings (1966-2025) from MLA JIL archives
  • Japan Foundation grants and fellowship data

Methodology

Network Construction

The scholarly network comprises three edge types:

Network metrics include degree centrality, betweenness centrality, PageRank, and generational influence scores. Authority rankings combine multiple metrics weighted by data quality.

Content Analysis

Dissertation and conference titles were analyzed using:

Canonicity Framework

The three-axis canonicity model measures authors on three independent dimensions:

This framework reveals misalignments: authors canonical in Japan but unstudied in Western academia, global bestsellers (Murakami) not yet established in Japanese canon, and Nobel laureates who fall below domestic canonicity thresholds.

Entity Resolution

Person name matching across heterogeneous data sources used:

Important Caveats

Selection Bias

These data privilege English-language scholarship and North American institutions. Japanese-language scholarship, non-PhD pathways, and scholars based in Japan or other non-Western countries are underrepresented. Conference data covers only five venues; many regional conferences and workshops are omitted.

Temporal Coverage Gaps

Pre-1970 data is sparse. Early career scholars (PhD 2020+) have limited publication records. Some conference years are missing due to unavailable programs or cancelled events (e.g., AAS 2020-2021 pandemic cancellations).

Citation Data Quality

OpenAlex citations were aggressively filtered (from 950,000 to 23,000 edges) to remove false positives from name collisions (common in East Asian names). This prioritizes precision over recall—genuine citations may have been excluded. JSTOR coverage favors older journals and omits recent open-access publications.

Name Ambiguity

Approximately 5-10% of person records involve ambiguous name matches. Short surnames (Li, Kim, Wang), variant romanizations (Ōe vs Oe), and married name changes complicate entity resolution. Some scholars may be split across multiple records or incorrectly merged.

Canonicity Model Limitations

The three-axis framework reflects measurable data (encyclopedia entries, translations, dissertations) rather than literary quality or cultural significance. It privileges certain forms of recognition (encyclopedia treatment, academic study) over others (popular readership, pedagogical influence). Newly published authors lack time to accumulate metrics.

Encyclopedia Scope

The Shinchō Encyclopedia (our primary source) focuses on Japanese-language creative writers. Critics, translators, and non-Japanese authors of Japanese literature are largely excluded. The encyclopedia's editorial choices reflect mid-20th century Japanese literary establishment values.

Technical Implementation

Visualizations use D3.js v7 for interactive graphics, with data processing in Python 3.13 (pandas, numpy, networkx, sqlite3). The encyclopedia project used Claude AI for NLP-based thematic classification. All code and raw data are available on request.

The color palette uses traditional Japanese color names adapted for accessibility in light and dark modes. Typography: Cormorant Garamond (headings), Source Sans 3 (body).

Attribution & Contact

This project was created by Jonathan E. Abel (Pennsylvania State University) as part of an ongoing investigation into the structure and evolution of Japanese literary studies.

Data sources are acknowledged throughout the visualizations. If you identify errors or have additional data to contribute, please reach out via the institutional website.

Citation: Abel, Jonathan E. (2026). Japanese Literary Studies: Interactive Visualizations. Digital Humanities Project. https://[URL]

Acknowledgments

Thanks to the institutions that make their data publicly available: ProQuest, JSTOR, OpenAlex, Wikidata, Japan Foundation, MLA, AAS, ACLA, SCMS, and AJLS. Thanks to colleagues who provided feedback on early versions of these visualizations.