I've got a PCAP file that has 3,445 HTTP "206 Partial Content" packets for the application/pdf media type. Each of these requests is for the same file, different Content-Ranges are being requested each time as a covert means of data exfiltration.
When I'm using the following tshark command to try to try to grab out all the information for reassembly, the byte count for the file data is different from the amount expected that is indicated by the Content-Length and the range of bytes in teh Content-Range:
tshark -r covert_exfiltration.pcapng -Y "http.response.code == 206 && http.content_type contains \"application/pdf\"" -T fields -e http.file_data
If you output this with JSON with the following command, there is an obvious mismatch:
tshark -r covert_exfiltration.pcapng -Y "http.response.code == 206 && http.content_type contains \"application/pdf\"" -T json > packet_json.json
This is one of the truncated examples of the output:
"http.response.line": "X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 3.0.19\r\n",
"http.response.line": "Accept-Ranges: bytes\r\n",
"http.cache_control": "public, max-age=0",
"http.response.line": "Cache-Control: public, max-age=0\r\n",
"http.last_modified": "Tue, 11 Apr 2023 18:53:32 GMT",
"http.response.line": "Last-Modified: Tue, 11 Apr 2023 18:53:32 GMT\r\n",
"http.response.line": "ETag: W/\"d0db2-18771aa80d4\"\r\n",
"http.content_type": "application/pdf",
"http.response.line": "Content-Type: application/pdf\r\n",
"http.response.line": "Content-Range: bytes 89517-91443/855474\r\n",
"http.content_length_header": "1927",
"http.content_length_header_tree": {
"http.content_length": "1927"
},
"http.response.line": "Content-Length: 1927\r\n",
"http.date": "Tue, 11 Apr 2023 19:47:33 GMT",
"http.response.line": "Date: Tue, 11 Apr 2023 19:47:33 GMT\r\n",
"http.connection": "close",
"http.response.line": "Connection: close\r\n",
"\\r\\n": "",
"http.response": "1",
"http.response_number": "1",
"http.time": "0.002493000",
"http.request_in": "126167",
"http.response_for.uri": "http://10.0.0.3:1337/report.pdf",
"http.file_data": "�t�\u00133,q\b����\u00190�)�{\u001b7�\u0014�p\\?q9�|>�l�T�SD4/�&\u001f�\u0011�-\"�a�k��r�0����G��L���df<>#\u0005����anZ�I��\u0007\u001e;��J��U�\u0002�2�c,\b��N\u00132!���N��(p,��\u0002\u0019�q�\"\fx#\u000eMn�+o��v�*�f�+sF6�\r_R�\u001e*��J-'C�>��b�/QS��C]&�pFQ��F�\n�$D�i7wE��^�k�.���3\u0019��.n�d����E��Kz�`��\f��f��h�\u0011*�x_��t�w����M)�\u0016u\u000e\fj����:N��Y��\u0013#@�\u0011�\u0018='�%�\u0016m�%�R�y�%�G�f��� "
You can see that even though the contents of the file are supposed to be 1927 bytes in length, the actual resulting http.file_data is only 683 bytes. What is the cause of this discrepancy, and how can I fix it so I can align the different file snippets correctly?