Skip to content

PST Encoding and filename fixes 116#150

Merged
baibhavr merged 57 commits intodevelopfrom
encoding_fixes-116
Apr 29, 2022
Merged

PST Encoding and filename fixes 116#150
baibhavr merged 57 commits intodevelopfrom
encoding_fixes-116

Conversation

@gwiedeman
Copy link
Copy Markdown
Collaborator

Type of Contribution

  • Bugfix (non-breaking change which fixes an issue)
  • New component
  • Refactoring (no functional changes)
  • Documentation-only

What does this implement/fix? Explain your changes.

Previously we could only read attachment filenames with the get_filenames() method which has yet to be merged with libpff. Now this reads filenames from hex entries which works with libpff. This incorporates #145 which will be closed.

Also finally sorts out how to get correct encoding for PST files and no longer relies on character detect.

Also parses the mime from libpff instead of guessing it from the filename.

Link to issue?

#116

  • Issue closed
  • Remain open

Pull Request Checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to the develop branch. Don't PR to main!
  • This contribution has sufficient documentation
  • Tests for the changes have been added
  • All tests pass

How has this been tested?

Operating System: win10
Python Version: 3.9.4

Licensing

  • I agree that the Mailbag Project and the University at Albany, SUNY can release this code under the MIT license.

baibhav and others added 30 commits April 5, 2022 05:11
…ng into a helper. Minor attachement fixes as well
adding table to html when pdf derivative is parsed
@gwiedeman gwiedeman changed the title Encoding fixes 116 PST Encoding and filename fixes 116 Apr 28, 2022
@gwiedeman gwiedeman requested a review from baibhavr April 28, 2022 20:36
@gwiedeman gwiedeman added the Input Parsing input data, such as MBOX, IMAP, PST, EML, etc. label Apr 28, 2022
@gwiedeman gwiedeman added this to the early release milestone Apr 28, 2022
@gwiedeman gwiedeman mentioned this pull request Apr 28, 2022
11 tasks
gwiedeman and others added 2 commits April 28, 2022 16:48
Conflicts:
	.github/workflows/build.yml
	data/mbox-sample1/1/Headers.pickle
	data/mbox-sample1/1/Message.pickle
	data/mbox-sample1/2/Headers.pickle
	data/pst-outlook2019_MSO_16.0.10377.20023_64-bit/1/HTML_Encoding.txt
	mailbag/controller.py
	mailbag/derivatives/eml.py
	mailbag/derivatives/html.py
	mailbag/derivatives/mbox.py
	mailbag/derivatives/pdf.py
	mailbag/derivatives/txt.py
	mailbag/derivatives/warc.py
	mailbag/formats/eml.py
	mailbag/formats/pst.py
	mailbag/helper.py
	requirements.txt
@baibhavr baibhavr merged commit 56ddc14 into develop Apr 29, 2022
@gwiedeman gwiedeman deleted the encoding_fixes-116 branch May 5, 2022 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Input Parsing input data, such as MBOX, IMAP, PST, EML, etc.

Projects

Status: Done, merged to develop

Development

Successfully merging this pull request may close these issues.

3 participants