Attacker Value
(2 users assessed)
(2 users assessed)
User Interaction
Privileges Required
Attack Vector

Apache Tika Header Command Injection CVE-2018-1335

Disclosure Date: April 25, 2018 Last updated February 13, 2020
Add MITRE ATT&CK tactics and techniques that apply to this CVE.


Before Tika 1.18, clients could send carefully crafted headers to tika-server that could be used to inject commands into the command line of the server running tika-server. This vulnerability only affects those running tika-server on a server that is open to untrusted clients.

Add Assessment

  • Attacker Value
  • Exploitability
Technical Analysis

David Yesland write up showed how to get command execution on Windows, however using a similar request structure on Linux did not work. The execution on the application was compared between Windows and Linux to identify why command injection was not working on the Linux system.

A breakpoint was set on the doOCR function that was mentioned in the analysis by David Yesland but that breakpoint was not hit while Apache Tika was running on Linux. After oberserving the call stack at doOCR on Windows, additional breakpoint were set in the IntelliJ debugger on Linux to identify where the execution between Windows and Linux differed.

While determining which parsers can handle a client request, the Apache Tika application calls the getSupportedTypes method from the various parsers. The following getSupportedTypes method is from the TesseractOCRParser class.

    public Set<MediaType> getSupportedTypes(ParseContext context) {
        TesseractOCRConfig config = (TesseractOCRConfig)context.get(TesseractOCRConfig.class, DEFAULT_CONFIG);
        return this.hasTesseract(config) ? SUPPORTED_TYPES : Collections.emptySet();

The config variable is set with data that includes information from the client request. Then the hasTesseract method is called to identify whether a tesseract executable is available.

    public boolean hasTesseract(TesseractOCRConfig config) {
        String tesseract = config.getTesseractPath() + getTesseractProg();
        if (TESSERACT_PRESENT.containsKey(tesseract)) {
            return (Boolean)TESSERACT_PRESENT.get(tesseract);
        } else {
            String[] checkCmd = new String[]{tesseract};
            boolean hasTesseract = ExternalParser.check(checkCmd, new int[0]);
            TESSERACT_PRESENT.put(tesseract, hasTesseract);
            return hasTesseract;

The tesseract variable is set by concatinating config.getTesseractPath(), which returns a string specified in the X-Tika-OCRTesseractPath request header, and getTesseractProg(), which returns the string tesseract on Linux hosts. The application then checks if the value of the tesseract variable has been checked before and returns true or false based on the past results. If the tesseract string has not been checked previously then ExternalParser.check is called.

    public static boolean check(String[] checkCmd, int... errorValue) {
        if (errorValue.length == 0) {
            errorValue = new int[]{127};

        try {
            Process process = Runtime.getRuntime().exec(checkCmd);
            Thread stdErrSuckerThread = ignoreStream(process.getErrorStream(), false);
            Thread stdOutSuckerThread = ignoreStream(process.getInputStream(), false);
            int result = process.waitFor();
            int[] var6 = errorValue;
            int var7 = errorValue.length;

            for(int var8 = 0; var8 < var7; ++var8) {
                int err = var6[var8];
                if (result == err) {
                    return false;

            return true;
        } catch (IOException var10) {
            return false;
        } catch (InterruptedException var11) {
            return false;
        } catch (SecurityException var12) {
            return false;
        } catch (Error var13) {
            if (var13.getMessage() == null || !var13.getMessage().contains("posix_spawn") && !var13.getMessage().contains("UNIXProcess")) {
                throw var13;
            } else {
                return false;

Runtime.getRuntime().exec executes with checkCmd, which is the concatenated string from the hasTesseract method. If the Runtime exec call succeeds, and the error check is passed, then true is returned. During testing of Apache Tika on a Linux host the Runtime.getRuntime().exec call was throwing an error. Different escaping of the user-controlled request header value was not successful on Linux. strace was used to determine the operating system call used by Runtime exec to execute checkCmd.

strace -f -p <java-pid>
[pid  4940] close(35)                   = 0
[pid  4940] getdents(4, /* 0 entries */, 32768) = 0
[pid  4940] close(4)                    = 0
[pid  4940] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid  4940] execve("/usr/local/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
[pid  4940] execve("/usr/local/bin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)
[pid  4940] execve("/usr/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory)

Partial client Request used to generate the strace output (request body is excluded):

PUT /meta HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
X-Tika-OCRTesseractPath: blahhhh
X-Tika-OCRLanguage: //E:Jscript
Expect: 100-continue
Content-type: image/jp2
Connection: close
Content-Type: application/x-www-form-urlencoded
Content-Length: 8086

From the strace output it is clear that the concatenated string ends up in the filename (first) parameter of the execve calls. Since the execve call does not use a full shell interpreter, the various injection attempts failed, which causes the Runtime.getRuntime().exec method to throw an error and return false. The false return value indicates that the TesseractOCRParser class is unable to handle the client request. Therefore the doOCR method that is used when exploiting the Apache Tika application on Windows to execute commands is not reached on the Linux host. If an attacker is able to upload an executable that ends with the string tesseract then the Runtime.getRuntime().exec check could return true and allow further processing of the request.

Technical Analysis

Easy to exploit. Possible on Windows due to JVM using CreateProcess under the hood, but probably not possible on Linux because execve.

General Information

Technical Analysis