llmjson repairs malformed JSON strings, particularly those generated by Large Language Models (LLMs). It uses Rust for fast, reliable JSON repair based on a vendored and bug-fixed version of the llm_json crate.
- Repairs missing quotes around keys and values
- Handles trailing commas
- Fixes unquoted keys
- Repairs incomplete arrays and objects
- Converts single quotes to double quotes
- Removes extra non-JSON characters
- Auto-completes missing values with sensible defaults
- Returns R objects directly with
return_objects = TRUE - Schema validation and type conversion with intuitive schema builders
- Control field presence with
.requiredand use.defaultfor missing required fields
You can install the development version of llmjson from GitHub:
# install.packages("remotes")
remotes::install_github("DyfanJones/llmjson")Or r-universe:
install.packages('llmjson', repos = c('https://dyfanjones.r-universe.dev', 'https://cloud.r-project.org'))This package requires the Rust toolchain to be installed on your system. If you don't have Rust installed:
- Install from https://rust-lang.org/tools/install/
- Minimum required version: Rust 1.65.0
library(llmjson)
# Repair JSON with trailing comma
repair_json_str('{"key": "value",}')
#> [1] "{\"key\":\"value\"}"
# Repair JSON with unquoted keys
repair_json_str('{key: "value"}')
#> [1] "{\"key\":\"value\"}"
# Repair incomplete JSON
repair_json_str('{"name": "John", "age": 30')
#> [1] "{\"name\":\"John\",\"age\":30}"
# Repair JSON with single quotes
repair_json_str("{'name': 'John'}")
#> [1] "{\"name\":\"John\"}"Instead of returning a JSON string, you can get R objects directly:
# Return as R list instead of JSON string
result <- repair_json_str('{"name": "Alice", "age": 30}', return_objects = TRUE)
result
#> $name
#> [1] "Alice"
#>
#> $age
#> [1] 30
# Works with all repair functions
result <- repair_json_file("data.json", return_objects = TRUE)JSON numbers that exceed R's 32-bit integer range (beyond -2,147,483,648 to 2,147,483,647) need special handling. The int64 parameter controls how these large integers are converted:
json_str <- '{"id": 9007199254740993}'
# Option 1: "double" (default) - Convert to R numeric (may lose precision)
result <- repair_json_str(json_str, return_objects = TRUE, int64 = "double")
result$id
#> [1] 9.007199e+15 # Lost precision: actual value is 9007199254740992
# Option 2: "string" - Preserve exact value as character
result <- repair_json_str(json_str, return_objects = TRUE, int64 = "string")
result$id
#> [1] "9007199254740993" # Exact value preserved
# Option 3: "bit64" - Use bit64 package for true 64-bit integers
# Requires: install.packages("bit64")
result <- repair_json_str(json_str, return_objects = TRUE, int64 = "bit64")
result$id
#> integer64
#> [1] 9007199254740993 # Exact value preserved with integer typeWhich option should I use?
- Use
"double"(default) if your integers fit safely in double precision and you don't need exact integer arithmetic - Use
"string"if you need to preserve exact values and plan to pass them to other systems - Use
"bit64"if you need exact integer arithmetic on large integers in R
Define schemas to validate JSON structure and ensure correct R types. The schema system is inspired by the structr package and provides an intuitive way to define expected JSON structures:
# Define a schema for a user object
schema <- json_object(
name = json_string(),
age = json_integer(),
email = json_string()
)
# Repair and validate with schema
result <- repair_json_str(
'{"name": "Alice", "age": "30", "email": "alice@example.com"}',
schema = schema,
return_objects = TRUE
)
# Note: age is coerced from string "30" to integer 30
str(result)
#> List of 3
#> $ name : chr "Alice"
#> $ age : int 30
#> $ email: chr "alice@example.com"Control how missing fields are handled with .required and .default parameters:
Required fields (.required = TRUE):
- Missing fields are added with their
.defaultvalue (or their type's default if no explicit default) - Always appear in the output
Optional fields (.required = FALSE, the default):
- Missing fields are omitted entirely from the output
- Only appear if present in the input JSON
# Example 1: Required field with explicit default
schema <- json_object(
name = json_string(.required = TRUE),
age = json_integer(.default = 25L, .required = TRUE) # required, will use default if missing
)
result <- repair_json_str('{"name": "Alice"}', schema = schema, return_objects = TRUE)
result
#> $name
#> [1] "Alice"
#>
#> $age
#> [1] 25
# Example 2: Optional field (omitted when missing)
schema <- json_object(
name = json_string(.required = TRUE),
nickname = json_string(.required = FALSE) # optional, omitted if not in input
)
result <- repair_json_str('{"name": "Bob"}', schema = schema, return_objects = TRUE)
result
#> $name
#> [1] "Bob"
# Note: nickname is not present since it was optional and missing from input
# Example 3: Required field with type default
schema <- json_object(
name = json_string(.required = TRUE),
age = json_integer(.required = TRUE) # required, will use type default (0L) if missing
)
result <- repair_json_str('{"name": "Charlie"}', schema = schema, return_objects = TRUE)
result
#> $name
#> [1] "Charlie"
#>
#> $age
#> [1] 0Build complex schemas with nested objects and arrays:
# Schema with nested object and array
schema <- json_object(
name = json_string(),
address = json_object(
city = json_string(),
zip = json_integer()
),
scores = json_array(json_integer())
)
json_str <- '{
"name": "Alice",
"address": {"city": "NYC", "zip": "10001"},
"scores": [90, 85, 95]
}'
result <- repair_json_str(json_str, schema = schema, return_objects = TRUE)
str(result)
#> List of 3
#> $ name : chr "Alice"
#> $ address:List of 2
#> ..$ city: chr "NYC"
#> ..$ zip : int 10001
#> $ scores : int [1:3] 90 85 95For repeated use with the same schema, use json_schema() to compile the schema once and reuse it many times.
# Define your schema
schema <- json_object(
name = json_string(),
age = json_integer(),
email = json_string()
)
# Build it once - this creates an optimized internal representation
built_schema <- json_schema(schema)
# Reuse many times - much faster!
for (json_str in json_strings) {
result <- repair_json_str(json_str, built_schema, return_objects = TRUE)
# Process result...
}Performance comparison (complex nested schema):
- Without
json_schema(): ~266µs per call - With
json_schema(): ~51µs per call (5.2x faster) - No schema: ~44µs per call
The performance benefit is especially significant for:
- Complex nested schemas with multiple levels
- Batch processing of many JSON strings
- Performance-critical applications
- Real-time data processing pipelines
# Read and repair JSON from a file
repair_json_file("malformed.json")
# With schema validation
schema <- json_object(
name = json_string(.required = TRUE),
age = json_integer(.default = 25L, .required = TRUE) # required field with default
)
result <- repair_json_file("data.json", schema = schema, return_objects = TRUE)# Repair JSON from raw byte vector
raw_data <- charToRaw('{"key": "value",}')
repair_json_raw(raw_data)
#> [1] "{\"key\":\"value\"}"
# With return_objects
result <- repair_json_raw(raw_data, return_objects = TRUE)Read and repair JSON from any R connection (files, URLs, pipes, compressed files, etc.):
# Read from a file connection
conn <- file("malformed.json", "r")
result <- repair_json_conn(conn)
close(conn)
# Read from a URL
conn <- url("https://api.example.com/data.json")
result <- repair_json_conn(conn, return_objects = TRUE)
close(conn)
# Read from a compressed file
conn <- gzfile("data.json.gz", "r")
result <- repair_json_conn(conn, return_objects = TRUE, int64 = "string")
close(conn)
# Use with() to ensure connection is closed automatically
result <- local({
conn <- file("malformed.json", "r")
on.exit(close(conn))
repair_json_conn(conn, return_objects = TRUE)
})Large Language Models often generate JSON that is almost correct but has minor syntax errors. This package helps you handle those cases gracefully:
# LLM might output JSON with trailing commas and unquoted keys
llm_output <- '{
users: [
{name: "Alice", age: 30,},
{name: "Bob", age: 25,},
],
}'
# Option 1: Repair and parse with your chosen JSON parser (e.g., jsonlite)
repaired <- repair_json_str(llm_output)
(parsed <- jsonlite::fromJSON(repaired))
#> $users
#> age name
#> 1 30 Alice
#> 2 25 Bob
# Option 2: Use schema with return_objects for type safety
schema <- json_object(
users = json_array(json_object(
name = json_string(),
age = json_integer()
))
)
result <- repair_json_str(llm_output, schema = schema, return_objects = TRUE)
str(result)
#> List of 1
#> $ users:List of 2
#> ..$ :List of 2
#> .. ..$ name: chr "Alice"
#> .. ..$ age : int 30
#> ..$ :List of 2
#> .. ..$ name: chr "Bob"
#> .. ..$ age : int 25All repair functions support the schema, return_objects, ensure_ascii, and int64 parameters:
repair_json_str(json_str, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")- Repair a malformed JSON stringrepair_json_file(path, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")- Read and repair JSON from a filerepair_json_raw(raw_bytes, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")- Repair JSON from a raw byte vectorrepair_json_conn(conn, schema = NULL, return_objects = FALSE, ensure_ascii = TRUE, int64 = "double")- Read and repair JSON from an R connection (file, URL, pipe, etc.)
Parameters:
schema- Optional schema definition (R list fromjson_object(), etc.) or built schema (fromjson_schema())return_objects- IfTRUE, returns R objects instead of JSON stringsensure_ascii- IfTRUE(default), escape non-ASCII characters in the output JSONint64- Policy for handling 64-bit integers:"double"(default),"string", or"bit64"
json_schema(schema)- Compile a schema definition for efficient reuse (5x performance improvement)json_object(..., .required)- Define a JSON object with named fieldsjson_integer(.default, .required)- Integer field (default: 0L)json_number(.default, .required)- Number/numeric field (default: 0.0)json_string(.default, .required)- String field (default: "")json_boolean(.default, .required)- Boolean field (default: FALSE)json_enum(.values, .default, .required)- Enum field with allowed values (default: first value)json_date(.default, .format, .required)- Date field with format specificationjson_timestamp(.default, .format, .tz, .required)- POSIXct datetime fieldjson_array(items, .required)- Array with specified item typejson_any(.required)- Accept any JSON type
While R has several JSON parsing packages like jsonlite, they typically fail when encountering malformed JSON. llmjson is specifically designed to handle the common errors that LLMs make when generating JSON output, making it ideal for:
- Processing LLM API responses
- Parsing structured data from AI-generated text
- Building robust data pipelines with LLM integrations
- Working with JSON data from web scraping or unreliable sources
This package includes a vendored and bug-fixed version of the llm_json Rust crate (v1.0.1) by Ribelo, which is itself a Rust port of the Python json_repair library by Stefano Baccianella (mangiucugna). Our vendored version includes critical bug fixes for array parsing not present in the upstream release.
The schema system was inspired by the structr package, which provides elegant patterns for defining and validating data structures in R.
Please note that the llmjson project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.