Using AI Tools

Advanced Visualizations as an Example

Prof. Dr. Claudius Gräbner-Radkowitsch

Europa-Universität Flensburg

2026-06-11

Today’s session

Part 1 — AI tools

  • What AI is (and isn’t) doing
  • Three modes of use
  • Three failure modes
  • The invariant habit

Live demo: seeing failure in real time

Part 2 — Publication-ready figures

  • Avoid ugly, bad, or wrong graphs
  • Choosing the right chart
  • Five key extensions
  • The 7-item checklist

Live demo + exercise

Note

The tools will change every six months. The habits we build today will not.

Part 1

What AI is (and isn’t) doing

AI predicts plausible next tokens given its training data.

It does not:

  • reason about your problem
  • look things up or verify claims
  • know what has changed since its training cutoff

The consequence:

Confidence is not correctness. Fluent-looking code is not correct code.

What AI is (and isn’t) doing

Hallucination is structural, not a glitch to be patched away.

Tip

A model trained on Stack Overflow posts from 2022 will confidently emit syntax that no longer works in ggplot2 3.4.0.

Note

What about AI with web search? Tools like Perplexity or Claude with research mode retrieve current text — partially lifting the training-cutoff limitation. But retrieved text is still summarised by a model that cannot verify it. The hallucination and reasoning limitations remain.

Three modes of use

Autocomplete

  • AI completes boilerplate you already know how to write

  • Risk: low

  • Gain: modest

  • Use when you would have typed it anyway

Thought partner

  • Ask for options, strategies, chart recommendations — you decide

  • Risk: medium

  • Gain: high

  • Works best when you have strong domain knowledge

Code reviewer

  • Paste your own code; ask what is wrong or what a referee would criticise

  • Risk: low

  • Gain: builds critical skill

  • Extremely useful, low-risk

Important

The choice of mode is itself the skill. The session returns to this after every demo.

Note

Modes combine. A typical workflow: use thought partner to sketch a plan, correct it yourself, then autocomplete fills in the code. Use code reviewer afterwards to catch what you missed.

Three failure modes

  1. Package hallucination AI invents plausible-sounding packages or functions that do not exist. A USENIX Security 2025 study across 16 models found ~20% of recommended packages were hallucinations. In R, this is silent until library() throws an error.

  2. Stale API knowledge Deprecated arguments (size=linewidth= in ggplot2 3.4.0), renamed functions, moved packages. The model was trained on older code. It cannot know what changed.

  3. Vibe coding Accepting AI output without reading it. Re-prompting when it breaks without diagnosing why. Each unread iteration makes the next harder to understand.

The invariant habit

prompt  →  inspect  →  run  →  diagnose
   ↑                              │
   └──────────────────────────────┘

Inspect — read the code before running it. Does it do what you asked?

Diagnose — when it breaks, understand why before re-prompting.

Note

Students who skip inspect and diagnose are vibe coding, even if they do not call it that.

This habit is what makes AI use durable as tools change. Everything else in this session may be obsolete in two years. This will not be.

Live demo

Tip

Watch for:

  • Which mode is being used at each step?
  • When does AI produce something plausible but wrong?
  • What does the diagnose step look like in practice?

Part 2

Two figures: same data

Exploratory — default output

Figure 1

Publication-ready

Figure 2

Two figures: the code

Exploratory

ggplot(gapminder,
       aes(x = gdpPercap,
           y = lifeExp)) +
  geom_point()

5 lines. Default output.

Publication-ready

gapminder |>
  filter(year == 2007) |>
  ggplot(aes(x = gdpPercap,
             y = lifeExp,
             color = continent)) +
  geom_point(alpha = 0.7, size = 2) +
  geom_text_repel(
    aes(label = country),
    data = \(d) filter(d, country == "Germany"),
    size = 4, nudge_x = -1.5, nudge_y = -2.0,
    min.segment.length = 0, 
    show.legend = FALSE
  ) +
  geom_text_repel(
    aes(label = country),
    data = \(d) filter(d, country == "Afghanistan"),
    size = 4, nudge_x = -0.75, nudge_y = -3,
    min.segment.length = 0, 
    show.legend = FALSE
  ) +
  geom_text_repel(
    aes(label = country),
    data = \(d) filter(d, country == "China"),
    size = 4, nudge_x = 0.5, nudge_y = -10,
    min.segment.length = 0, 
    show.legend = FALSE
  ) +
  scale_x_log10(
    labels = label_dollar(suffix = "k", scale = 0.001)) +
  scale_color_viridis_d() +
  theme_classic(base_size = 11) +
  labs(
    x = "GDP per capita (USD, log scale)",
    y = "Life expectancy (years)",
    color = NULL,
    caption = "Source: Gapminder. Year: 2007."
  )

Ugly, bad, wrong

Three different problems — three different fixes.

Label Problem Example Fix
Ugly Aesthetic Grey background, gdpPercap as axis label theme_classic(), labs()
Bad Perceptual Wrong chart type, overlapping labels, spaghetti Rethink the encoding
Wrong Mathematical No uncertainty, truncated axis, wrong aggregation Fix the underlying logic

Important

“It doesn’t look great” is not a diagnosis. \(\rightarrow\) “The y-axis is truncated and the comparison is therefore wrong” is.

Regression results — Coefficient plot

“What predicts life expectancy, and how uncertain are we?”

Figure 3

Note

Key geom: geom_pointrange() — estimate + uncertainty in one layer

Distributions across groups — Violin + jitter

“How does life expectancy distribute across continents?”

Figure 4

Note

Key geoms: geom_violin() + geom_jitter() — shape AND raw data

Bivariate with heterogeneity — Faceted scatter

“Does the income–health relationship vary by continent?”

Figure 5

Note

Key geoms: geom_smooth(method = "lm") + facet_wrap() — relationship × group

Time series, multiple units — Highlighted lines

“How has life expectancy evolved across European countries?”

Figure 6

Note

Key geoms: geom_line() + gghighlight() — focus without removing context

Event study / DiD — Centred point-range

“Did the policy have an effect, and when did it start?”

Figure 7

Note

Key geoms: geom_pointrange() + geom_vline() — pre/post symmetry + event marker

Choosing the right chart

Warning

AI defaults to whatever was most common in its training data — usually a bar chart.

You must choose the chart type before you prompt.

Question type Chart Key geom(s)
Regression coefficients Coefficient plot geom_pointrange()
Distributions across groups Violin + jitter geom_violin() + geom_jitter()
Bivariate with heterogeneity Faceted scatter geom_smooth() + facet_wrap()
Time series, multiple units Highlighted lines geom_line() + gghighlight()
Event study / DiD Centered point-range geom_pointrange() + geom_vline()

The 7-item publication checklist

  1. Axis labels with units; no raw variable names
  2. Colorblind-safe palette that fits your data (e.g., viridis or Okabe-Ito)
  3. Uncertainty on every estimate
  4. theme_classic() (or similar) — no default grey background
  5. Base size ~10–12pt for a 3.5-inch figure
  6. Source/notes in labs(caption = ...)
  7. ggsave() with explicit dimensions — never the RStudio Export button
ggsave(
  "figures/fig1.pdf",
  plot = p,
  width  = 5.5,
  height = 3.5,
  units  = "in",
  device = cairo_pdf
)

Tip

cairo_pdf embeds fonts correctly for journal submission.

Five extensions worth knowing

patchwork — multi-panel layout

(p1 | p2) / p3 +
  plot_annotation(
    tag_levels = "A"
  ) +
  plot_layout(
    guides = "collect"
  )

ggrepel — non-overlapping labels

geom_text_repel(aes(label = country))

gghighlight — focus attention

gghighlight(continent == "Europe")

ggtext — Markdown in titles

labs(
  title = "**Treated** firms earn more"
) +
theme(
  plot.title = element_markdown()
)

scales — axis formatting

scale_y_continuous(
  labels = label_comma()
)
scale_x_log10(
  labels = label_dollar()
)

Where AI helps — and where it does not

AI is strong on:

  • Scaffolding a first draft quickly
  • Explaining error messages
  • Adjusting individual aesthetic parameters
  • Refactoring code to use the pipe

AI struggles with:

  • Choosing the right chart for the question
  • Multi-panel composition and shared legends (patchwork)
  • Up-to-date ggplot2 API (linewidth vs size)
  • Journal-specific constraints it was not told about
  • Self-correcting when a hallucinated package fails

Note

AI can get you to about 70% of a publication-ready figure. The last 30% requires judgment you cannot delegate.

Live demo

Tip

Watch for:

  • When and how do we choose the chart type?
  • Where does AI succeed at scaffolding?
  • Where do we need to step in as the human reviewer?

Exercise

Available via Github Classroom

Track A — suggested starting points

  • WMS: distribution of management scores across countries
  • CPS: wage–education relationship by subgroup

Track B — bring your own question

  • State it in one sentence
  • If you cannot, use Track A

Important

Before opening any AI tool: Write one sentence justifying your chart type choice.

Quarto + AI: three patterns

Error diagnosis — paste a render error with the surrounding chunk

“What is causing this error?”

Reliable on YAML errors and chunk option problems.

Goal-finding — ask how to achieve a specific Quarto output

“How do I make a figure span the full slide width in Reveal.js?”

AI as documentation shortcut — always verify at quarto.org.

Prose refinement — describe your figure and ask whether the claim holds

“Is this claim supported by the figure? Can it be more concise?”

Most tools accept a screenshot — share one directly. If yours does not, describing the figure forces the clarity the caption needs anyway.

Closing debrief

One thing AI is genuinely useful for in your workflow

One thing you would never fully delegate to AI

One habit from today you will keep using after the course

Note

Next session: Recap — send me your question until the end of the week.