High
Apache Tika Header Command Injection CVE-2018-1335
CVE ID
AttackerKB requires a CVE ID in order to pull vulnerability data and references from the CVE list and the National Vulnerability Database. If available, please supply below:
Add References:
High
(2 users assessed)High
(2 users assessed)Unknown
Unknown
Unknown
Apache Tika Header Command Injection CVE-2018-1335
MITRE ATT&CK
Collection
Command and Control
Credential Access
Defense Evasion
Discovery
Execution
Exfiltration
Impact
Initial Access
Lateral Movement
Persistence
Privilege Escalation
Topic Tags
Description
From Apache Tika versions 1.7 to 1.17, clients could send carefully crafted headers to tika-server that could be used to inject commands into the command line of the server running tika-server. This vulnerability only affects those running tika-server on a server that is open to untrusted clients. The mitigation is to upgrade to Tika 1.18.
Add Assessment
Ratings
-
Attacker ValueHigh
-
ExploitabilityHigh
Technical Analysis
David Yesland write up showed how to get command execution on Windows, however using a similar request structure on Linux did not work. The execution on the application was compared between Windows and Linux to identify why command injection was not working on the Linux system.
A breakpoint was set on the doOCR
function that was mentioned in the analysis by David Yesland but that breakpoint was not hit while Apache Tika was running on Linux. After oberserving the call stack at doOCR
on Windows, additional breakpoint were set in the IntelliJ debugger on Linux to identify where the execution between Windows and Linux differed.
While determining which parsers can handle a client request, the Apache Tika application calls the getSupportedTypes
method from the various parsers. The following getSupportedTypes
method is from the TesseractOCRParser
class.
public Set<MediaType> getSupportedTypes(ParseContext context) { TesseractOCRConfig config = (TesseractOCRConfig)context.get(TesseractOCRConfig.class, DEFAULT_CONFIG); return this.hasTesseract(config) ? SUPPORTED_TYPES : Collections.emptySet(); }
The config
variable is set with data that includes information from the client request. Then the hasTesseract
method is called to identify whether a tesseract executable is available.
public boolean hasTesseract(TesseractOCRConfig config) { String tesseract = config.getTesseractPath() + getTesseractProg(); if (TESSERACT_PRESENT.containsKey(tesseract)) { return (Boolean)TESSERACT_PRESENT.get(tesseract); } else { String[] checkCmd = new String[]{tesseract}; boolean hasTesseract = ExternalParser.check(checkCmd, new int[0]); TESSERACT_PRESENT.put(tesseract, hasTesseract); return hasTesseract; } }
The tesseract
variable is set by concatinating config.getTesseractPath()
, which returns a string specified in the X-Tika-OCRTesseractPath
request header, and getTesseractProg()
, which returns the string tesseract
on Linux hosts. The application then checks if the value of the tesseract
variable has been checked before and returns true
or false
based on the past results. If the tesseract
string has not been checked previously then ExternalParser.check
is called.
public static boolean check(String[] checkCmd, int... errorValue) { if (errorValue.length == 0) { errorValue = new int[]{127}; } try { Process process = Runtime.getRuntime().exec(checkCmd); Thread stdErrSuckerThread = ignoreStream(process.getErrorStream(), false); Thread stdOutSuckerThread = ignoreStream(process.getInputStream(), false); stdErrSuckerThread.join(); stdOutSuckerThread.join(); int result = process.waitFor(); int[] var6 = errorValue; int var7 = errorValue.length; for(int var8 = 0; var8 < var7; ++var8) { int err = var6[var8]; if (result == err) { return false; } } return true; } catch (IOException var10) { return false; } catch (InterruptedException var11) { return false; } catch (SecurityException var12) { return false; } catch (Error var13) { if (var13.getMessage() == null || !var13.getMessage().contains("posix_spawn") && !var13.getMessage().contains("UNIXProcess")) { throw var13; } else { return false; } } }
Runtime.getRuntime().exec
executes with checkCmd
, which is the concatenated string from the hasTesseract
method. If the Runtime exec call succeeds, and the error check is passed, then true
is returned. During testing of Apache Tika on a Linux host the Runtime.getRuntime().exec
call was throwing an error. Different escaping of the user-controlled request header value was not successful on Linux. strace
was used to determine the operating system call used by Runtime exec to execute checkCmd
.
strace -f -p <java-pid> ... [pid 4940] close(35) = 0 [pid 4940] getdents(4, /* 0 entries */, 32768) = 0 [pid 4940] close(4) = 0 [pid 4940] fcntl(3, F_SETFD, FD_CLOEXEC) = 0 [pid 4940] execve("/usr/local/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory) [pid 4940] execve("/usr/local/bin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory) [pid 4940] execve("/usr/sbin/blahhhhtesseract", ["blahhhhtesseract"], 0x7ffd1272ed40 /* 46 vars */) = -1 ENOENT (No such file or directory) ...
Partial client Request used to generate the strace output (request body is excluded):
PUT /meta HTTP/1.1 Host: 172.22.222.112:9998 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) X-Tika-OCRTesseractPath: blahhhh X-Tika-OCRLanguage: //E:Jscript Expect: 100-continue Content-type: image/jp2 Connection: close Content-Type: application/x-www-form-urlencoded Content-Length: 8086
From the strace
output it is clear that the concatenated string ends up in the filename
(first) parameter of the execve
calls. Since the execve
call does not use a full shell interpreter, the various injection attempts failed, which causes the Runtime.getRuntime().exec
method to throw an error and return false
. The false
return value indicates that the TesseractOCRParser
class is unable to handle the client request. Therefore the doOCR
method that is used when exploiting the Apache Tika application on Windows to execute commands is not reached on the Linux host. If an attacker is able to upload an executable that ends with the string tesseract
then the Runtime.getRuntime().exec
check could return true
and allow further processing of the request.
Would you also like to delete your Exploited in the Wild Report?
Delete Assessment Only Delete Assessment and Exploited in the Wild ReportTechnical Analysis
Easy to exploit. Possible on Windows due to JVM using CreateProcess under the hood, but probably not possible on Linux because execve.
Would you also like to delete your Exploited in the Wild Report?
Delete Assessment Only Delete Assessment and Exploited in the Wild ReportCVSS V3 Severity and Metrics
General Information
Vendors
- apache
Products
- tika
References
Additional Info
Technical Analysis
Report as Emergent Threat Response
Report as Exploited in the Wild
CVE ID
AttackerKB requires a CVE ID in order to pull vulnerability data and references from the CVE list and the National Vulnerability Database. If available, please supply below: