Parquet Under Fire: A Technical Analysis of CVE-2025-30065

Last week, a vulnerability in Apache Parquet’s Java library CVE-2025-30065 was published, carrying a CVSS score of 10.0. Parquet is widely used in modern data pipelines and analytics systems, including technologies like Apache Spark, Trino, Iceberg, etc. As a result, a malicious actor who is able to deliver a crafted Parquet file could potentially trigger remote code execution or RCE in the underlying service that processes it.

Basically, CVE-2025-30065 allows an attacker to execute arbitrary code on the JVM. By embedding a malicious schema in a Parquet file that references a Java class accepting a String argument, the attacker can cause the JVM to instantiate the class reflectively when the file is read. This is not a typical Java deserialization vulnerability (which targets native Java object streams), but rather an exploitation of Parquet’s Avro schema handling.

I have shared a PoC at https://github.com/mouadk/parquet-rce-poc-CVE-2025-30065/tree/main.

The Vulnerability

With Parquet, one can encode an Avro schema that describes the data surfaced in the file. Avro is a schema-based data serialization library and includes a feature that any attacker would find very very attractive: the "java-class" property. Here is what the json schema looks like:

 {
  "type": "record",
  "name":  "MaliciousRecord",
  "fields" : [
    {
        "name": "evil",
        "type": {
            "type": "string",
            "java-class": "com.evil.RCEPayload"
        }
    }
  ]
              
}

The property allows the reader to instantiate Java objects via reflection. Unfortunately, in vulnerable versions, this behavior was not restricted by default.

As a result, an attacker could specify any Java class present on the application's classpath, and the JVM would attempt to load and instantiate it during deserialization. If the class isn't available, loading would fail but if it is, it could lead to code execution.

The core of the vulnerability lies in Parquet’s Avro converter, specifically in AvroConverters.FieldStringableConverter, which performs the reflective instantiation. The critical variable here is stringableClass, which is derived from the java-class annotation in the Avro schema. This is where the attacker injects the class name to be loaded.


public FieldStringableConverter(ParentValueContainer parent,
                                    Class<?> stringableClass) {
      super(parent);
      stringableName = stringableClass.getName();
      try {
        this.ctor = stringableClass.getConstructor(String.class);
      } catch (NoSuchMethodException e) {
        throw new ParquetDecodingException(
            "Unable to get String constructor for " + stringableName, e);
      }
    }
...

private static final String JAVA_CLASS_PROP = "java-class";
private static final String JAVA_KEY_CLASS_PROP = "java-key-class";
...
String stringableClass = schema.getProp(
          isMap ? JAVA_KEY_CLASS_PROP : JAVA_CLASS_PROP);
....

The fix

The vulnerabiltiy was fixed in 1.15.1. The commit is shown below:

https://github.com/wgtmac/parquet-mr/commit/d185f867c1eb968ac6de5024c70de2aa3b923ec2

It consists of adding a security check checkSecurity(Class<?> clazz) ensuring that only trustedPackages can be deserialized.

Conclusion

A library or framework used to read Parquet files should never have the capability to start processes or make certain system calls via JDK libraries. With ADRs or Application Detection and Response, detecting and mitigating this vulnerability is peanuts.