Intellectual History of the Field: Dissertations, Topics, and Institutional Networks (1915-2025)
All analytical visualizations now support source toggling between keyword-based and NLP-based topic classifications:
Use the [Keyword] [NLP] [Combined] buttons in each visualization to compare approaches.
How did each topic spread through institutions over time? Animated scatter plot showing the geographic diffusion of 20 intellectual topics across 15 universities from 1950 to 2025. Toggle between keyword and NLP classifications to compare coverage patterns.
Open VisualizationWhat are the intellectual signatures of the top 20 universities? Small multiples showing topic profiles across four time periods, revealing how each institution's focus has evolved. Compare keyword vs NLP institutional profiles.
Open VisualizationHow much do advisees drift from their advisor and institution? Scatter plot mapping 2,316 advisor-advisee pairs by topic similarity, revealing loyalists, rebels, and independents. See how NLP changes similarity scores.
Open VisualizationAre there distinct intellectual "schools" of institutions? Hierarchical clustering, heatmap, and network visualization revealing how universities group by topic similarity. Toggle sources to see different clustering patterns.
Open VisualizationHow have topics shifted across 1950-2025? Stacked area chart and bump chart showing the rise and fall of 20 intellectual topics over 75 years of dissertation production. Compare temporal trends across classification methods.
Open VisualizationVisualize the three network layers (advisor, review, citation) as separable planes. See which scholars bridge multiple networks, explore cross-layer connections, and filter by network membership with Venn diagram overlay.
Open VisualizationTrace how topics flow through advisor lineages across generations, see which advisors produce which topics, track institutional knowledge exchange, and watch topic evolution over 75 years of the field. Four modes in one visualization.
Open VisualizationSelect a founding scholar and watch their lineage radiate outward ring by ring. Explore generation depth, expand/collapse subtrees, animate growth over time, and trace lineage paths back to the roots of the discipline.
Open VisualizationKeyword-based datasets:
NLP-based datasets:
Network & authority data:
Reports:
Data source: 6,875 dissertations from ProQuest Dissertations & Theses Global, enriched with Academic Family Tree advisor-advisee data and JSTOR/OpenAlex citation data. Includes 4,376 Japanese-language dissertations from CiNii/NDL.
Literature filter: Dissertations are classified as Japanese literature (modern and premodern) using a three-stage process: subject code classification, text-based keyword matching against titles and abstracts, and manual review of ambiguous cases. This filter excludes non-literary disciplines (history, political science, anthropology, etc.) while retaining the full scope of Japanese literary studies.
Topic classification (Dual-source approach):
Keyword approach: 20 topic clusters defined by manually curated keyword lists, matched against full text of titles and abstracts. Scores normalized by word count. Coverage: 2,360 dissertations (34.3%).
NLP approach: Semantic topic classification using Claude Sonnet 4.5 API. Processes full titles and abstracts to assign themes, literary figures, periods, and genres. Coverage: 6,097 dissertations (88.7%), including 94.2% of Japanese-language CiNii dissertations (vs. 3.2% with keyword approach). Improvement: +54.6 percentage points overall coverage.
Source toggling: All analytical visualizations support switching between keyword-based, NLP-based, and combined topic classifications. Use the toolbar buttons to compare approaches and see how classification method affects patterns, trends, and institutional profiles.
Similarity: Cosine similarity between 20-dimensional topic vectors. Advisor vectors aggregate their advisees' dissertation topics. Institution vectors aggregate all dissertations at that institution.
Clustering: Complete-linkage agglomerative clustering of institutions by topic cosine distance, cut at 5 clusters.
Network metrics: PageRank (d=0.85, 50 iterations), HITS authority/hub scores (50 iterations), and generation depth (BFS from roots) computed across advisor, review, and citation network layers for literature-filtered core scholars.
Tools: Python 3.13 (stdlib only, no numpy/scipy), SQLite, D3.js v7, HTML5 Canvas. All visualizations are self-contained HTML files. NLP classification via Anthropic Claude API (Sonnet 4.5).
Japanese Literary Studies Digital Humanities Project
Data: ProQuest, Academic Family Tree (CC-BY 3.0), JSTOR, OpenAlex