Using AI Tools

Advanced Visualizations as an Example

Prof. Dr. Claudius Gräbner-Radkowitsch

Europa-Universität Flensburg

2026-06-11

Today’s session

Part 1 — AI tools

What AI is (and isn’t) doing
Three modes of use
Three failure modes
The invariant habit

Live demo: seeing failure in real time

Part 2 — Publication-ready figures

Avoid ugly, bad, or wrong graphs
Choosing the right chart
Five key extensions
The 7-item checklist

Live demo + exercise

Note

The tools will change every six months. The habits we build today will not.

Part 1

What AI is (and isn’t) doing

AI predicts plausible next tokens given its training data.

It does not:

reason about your problem
look things up or verify claims
know what has changed since its training cutoff

The consequence:

Confidence is not correctness. Fluent-looking code is not correct code.

What AI is (and isn’t) doing

Hallucination is structural, not a glitch to be patched away.

Tip

A model trained on Stack Overflow posts from 2022 will confidently emit syntax that no longer works in ggplot2 3.4.0.

Note

What about AI with web search? Tools like Perplexity or Claude with research mode retrieve current text — partially lifting the training-cutoff limitation. But retrieved text is still summarised by a model that cannot verify it. The hallucination and reasoning limitations remain.

Three modes of use

Autocomplete

AI completes boilerplate you already know how to write
Risk: low
Gain: modest
Use when you would have typed it anyway

Thought partner

Ask for options, strategies, chart recommendations — you decide
Risk: medium
Gain: high
Works best when you have strong domain knowledge

Code reviewer

Paste your own code; ask what is wrong or what a referee would criticise
Risk: low
Gain: builds critical skill
Extremely useful, low-risk

Important

The choice of mode is itself the skill. The session returns to this after every demo.

Note

Modes combine. A typical workflow: use thought partner to sketch a plan, correct it yourself, then autocomplete fills in the code. Use code reviewer afterwards to catch what you missed.

Three failure modes

Package hallucination AI invents plausible-sounding packages or functions that do not exist. A USENIX Security 2025 study across 16 models found ~20% of recommended packages were hallucinations. In R, this is silent until library() throws an error.
Stale API knowledge Deprecated arguments (size= → linewidth= in ggplot2 3.4.0), renamed functions, moved packages. The model was trained on older code. It cannot know what changed.
Vibe coding Accepting AI output without reading it. Re-prompting when it breaks without diagnosing why. Each unread iteration makes the next harder to understand.

The invariant habit

prompt  →  inspect  →  run  →  diagnose
   ↑                              │
   └──────────────────────────────┘

Inspect — read the code before running it. Does it do what you asked?

Diagnose — when it breaks, understand why before re-prompting.

Note

Students who skip inspect and diagnose are vibe coding, even if they do not call it that.

This habit is what makes AI use durable as tools change. Everything else in this session may be obsolete in two years. This will not be.

Live demo

Tip

Watch for:

Which mode is being used at each step?
When does AI produce something plausible but wrong?
What does the diagnose step look like in practice?

Part 2

Two figures: same data

Exploratory — default output

Publication-ready

Two figures: the code

Exploratory

ggplot(gapminder,
       aes(x = gdpPercap,
           y = lifeExp)) +
  geom_point()

5 lines. Default output.

Publication-ready

gapminder |>
  filter(year == 2007) |>
  ggplot(aes(x = gdpPercap,
             y = lifeExp,
             color = continent)) +
  geom_point(alpha = 0.7, size = 2) +
  geom_text_repel(
    aes(label = country),
    data = \(d) filter(d, country == "Germany"),
    size = 4, nudge_x = -1.5, nudge_y = -2.0,
    min.segment.length = 0, 
    show.legend = FALSE
  ) +
  geom_text_repel(
    aes(label = country),
    data = \(d) filter(d, country == "Afghanistan"),
    size = 4, nudge_x = -0.75, nudge_y = -3,
    min.segment.length = 0, 
    show.legend = FALSE
  ) +
  geom_text_repel(
    aes(label = country),
    data = \(d) filter(d, country == "China"),
    size = 4, nudge_x = 0.5, nudge_y = -10,
    min.segment.length = 0, 
    show.legend = FALSE
  ) +
  scale_x_log10(
    labels = label_dollar(suffix = "k", scale = 0.001)) +
  scale_color_viridis_d() +
  theme_classic(base_size = 11) +
  labs(
    x = "GDP per capita (USD, log scale)",
    y = "Life expectancy (years)",
    color = NULL,
    caption = "Source: Gapminder. Year: 2007."
  )

Ugly, bad, wrong

Three different problems — three different fixes.

Label	Problem	Example	Fix
Ugly	Aesthetic	Grey background, `gdpPercap` as axis label	`theme_classic()`, `labs()`
Bad	Perceptual	Wrong chart type, overlapping labels, spaghetti	Rethink the encoding
Wrong	Mathematical	No uncertainty, truncated axis, wrong aggregation	Fix the underlying logic

Important

“It doesn’t look great” is not a diagnosis. \(\rightarrow\) “The y-axis is truncated and the comparison is therefore wrong” is.

Regression results — Coefficient plot

“What predicts life expectancy, and how uncertain are we?”

Figure 3

Note

Key geom: geom_pointrange() — estimate + uncertainty in one layer

Distributions across groups — Violin + jitter

“How does life expectancy distribute across continents?”

Figure 4

Note

Key geoms: geom_violin() + geom_jitter() — shape AND raw data

Bivariate with heterogeneity — Faceted scatter

“Does the income–health relationship vary by continent?”

Figure 5

Note

Key geoms: geom_smooth(method = "lm") + facet_wrap() — relationship × group

Time series, multiple units — Highlighted lines

“How has life expectancy evolved across European countries?”

Figure 6

Note

Key geoms: geom_line() + gghighlight() — focus without removing context

Event study / DiD — Centred point-range

“Did the policy have an effect, and when did it start?”

Figure 7

Note

Key geoms: geom_pointrange() + geom_vline() — pre/post symmetry + event marker

Choosing the right chart

Warning

AI defaults to whatever was most common in its training data — usually a bar chart.

You must choose the chart type before you prompt.

Question type	Chart	Key geom(s)
Regression coefficients	Coefficient plot	`geom_pointrange()`
Distributions across groups	Violin + jitter	`geom_violin()` + `geom_jitter()`
Bivariate with heterogeneity	Faceted scatter	`geom_smooth()` + `facet_wrap()`
Time series, multiple units	Highlighted lines	`geom_line()` + `gghighlight()`
Event study / DiD	Centered point-range	`geom_pointrange()` + `geom_vline()`

The 7-item publication checklist

Axis labels with units; no raw variable names
Colorblind-safe palette that fits your data (e.g., viridis or Okabe-Ito)
Uncertainty on every estimate
theme_classic() (or similar) — no default grey background
Base size ~10–12pt for a 3.5-inch figure
Source/notes in labs(caption = ...)
ggsave() with explicit dimensions — never the RStudio Export button

ggsave(
  "figures/fig1.pdf",
  plot = p,
  width  = 5.5,
  height = 3.5,
  units  = "in",
  device = cairo_pdf
)

Tip

cairo_pdf embeds fonts correctly for journal submission.

Five extensions worth knowing

patchwork — multi-panel layout

(p1 | p2) / p3 +
  plot_annotation(
    tag_levels = "A"
  ) +
  plot_layout(
    guides = "collect"
  )

ggrepel — non-overlapping labels

geom_text_repel(aes(label = country))

gghighlight — focus attention

gghighlight(continent == "Europe")

ggtext — Markdown in titles

labs(
  title = "**Treated** firms earn more"
) +
theme(
  plot.title = element_markdown()
)

scales — axis formatting

scale_y_continuous(
  labels = label_comma()
)
scale_x_log10(
  labels = label_dollar()
)

Where AI helps — and where it does not

AI is strong on:

Scaffolding a first draft quickly
Explaining error messages
Adjusting individual aesthetic parameters
Refactoring code to use the pipe

AI struggles with:

Choosing the right chart for the question
Multi-panel composition and shared legends (patchwork)
Up-to-date ggplot2 API (linewidth vs size)
Journal-specific constraints it was not told about
Self-correcting when a hallucinated package fails

Note

AI can get you to about 70% of a publication-ready figure. The last 30% requires judgment you cannot delegate.

Live demo

Tip

Watch for:

When and how do we choose the chart type?
Where does AI succeed at scaffolding?
Where do we need to step in as the human reviewer?

Exercise

Available via Github Classroom

Track A — suggested starting points

WMS: distribution of management scores across countries
CPS: wage–education relationship by subgroup

Track B — bring your own question

State it in one sentence
If you cannot, use Track A

Important

Before opening any AI tool: Write one sentence justifying your chart type choice.

Quarto + AI: three patterns

Error diagnosis — paste a render error with the surrounding chunk

“What is causing this error?”

Reliable on YAML errors and chunk option problems.

Goal-finding — ask how to achieve a specific Quarto output

“How do I make a figure span the full slide width in Reveal.js?”

AI as documentation shortcut — always verify at quarto.org.

Prose refinement — describe your figure and ask whether the claim holds

“Is this claim supported by the figure? Can it be more concise?”

Most tools accept a screenshot — share one directly. If yours does not, describing the figure forces the clarity the caption needs anyway.

Closing debrief

One thing AI is genuinely useful for in your workflow

One thing you would never fully delegate to AI

One habit from today you will keep using after the course

Note

Next session: Recap — send me your question until the end of the week.