Sunday, February 28, 2016

Received and sent messages in a single mailbox with MS Outlook for OSX

Microsoft Outlook for the Mac and PC behave differently when showing conversations in the Inbox. The PC shows received and sent messages. The Mac shows only the received messages.  There is no default way to show a threaded conversation on Mac Office 2016.

Microsoft Outlook for the Mac is integrated with OS/X spotlight search so that AppleScript and Spotlight can be used to create Outlook Smart Mail folders. Smart Folders are more like views into mailboxes than actual mailboxes. They are virtual folders that are created from the results of a search.  This blog leverages Outlook's raw search capabilities that come from OS/X integration.  You can find out more information about this integration on the Microsoft answers web site. Portions of this blog came from this excellent blog posting.

Identify Mailboxes to be included in Smart Folder

Our conversation SmartFolder is made up of the contents of the Inbox and Sent mailboxes. We first need to identify the Microsoft Outlook folder identifiers for the two mailboxes.

  • Run Outlook
  • Highlight the Mailbox,  we are going to include, Inbox or Sent. We want to get this mailbox's folder id.
  • Start a Mac Spotlight search
  • Enter applesoft and Enter to bring up the applesoft editor.
  • Enter the following in the applescript editor and run it
on run {}
tell application "Microsoft Outlook"
get selected folder
end tell
end run

  • It should return the results of the execution
    mail folder id 109 of application "Microsoft Outlook"
  • Do the same thing for the other mailbox.  
  • My mailbox numbers were 109 (Inbox) and 112 (Sent)

Create an integrated Threaded Conversation 

Build Smart Folder with a Raw Search

  • Click on the Search field in the upper right hand corner of the Outlook view.
    • This will enable the search tab and ribbon
  • Select the Search tab
  • Select All Mailboxes" in the Ribbon Bar
  • Press the Advanced search button in the ribbon
  • Select Raw Query from the drop list. 
  • Enter the following query, replacing 109 and 112 with the mailboxes numbers retrieved above.
    com_microsoft_outlook_folderID == 109 || com_microsoft_outlook_folderID == 112
  • Press Enter.  The Smart Folder should populate with the combined content of the two folders. Conversations in this combined Index/Sent folder will include both inbound and outbound messages.
  • Press Save Search in the Search ribbon bar and enter the name of your new Smart Folder.
Your search bar should look something like the following


Caveats

Smart folders are query result views and not real folders. You can use the standard Search functionality against a Smart Folder.  The system treats the additional Search terms as part of the Smart Folder's query and will ask you if you wish to change the folder query every time you move Outlook from the Smart Folder to a traditional folder.  You can tell Outlook to "not save" the changes.  Yeah, it is kind of annoying.

Create an Unread Email Smart Folder

You can create a Smart Folder of just unread messages similar to the conversation folder described above.
  • Select the Inbox
  • Select Search
  • Select the Search Tab if it is not showing
  • Select Advanced Query in the Search Ribbon
  • Change the query type to Raw
  • Enter the following into the raw query area:
com_microsoft_outlook_unread != 0
  • Press Save Search and enter the name of the new Smart folder.

Additional Resources

Discussion on the mdls command and Mac / Outlook variables for raw queries can be found in this Apple discussions thread.

Created 2016 Feb 02

Monday, February 15, 2016

Almost PaaS Document Parsing with Tika and AWS Elastic Beanstalk

The Apache Tika project provides a  library capable of parsing and extracting data and meta data from over 1000 file types.  Tika is available as a single jar file that can be included inside applications or as a deployable jar file that runs Tika as a standalone service.

This blog describes deploying the Tika jar as an auto-scale service in Amazon AWS Elastic Beanstalk.  I selected Elastic Beanstalk because it supports jar based deployments without any real Infrastructure configuration. Elastic Beanstalk auto-scale should take care of scaling up and down for for the number of requests you get.

Tika parses documents and extracts their text completely in memory. Tika was deployed for this blog using EC2 t2.micro instances available in the AWS free tier. t2.micro VMs are 1GB which means that you are restricted in document complexity and size. You would size your instances appropriately for your largest documents.  


Preconditions

  • An AWS account.
  • AWS access id and secret key.  This is most easily created in the AWS web console.  Amazon recommends using IAM credentials. I used the default account credentials since this was done mostly as a PoC. Remember to save any key id and key values that you need. They cannot be recovered once generated.
  • Read the Amazon Elastic Beanstalk command line instructions

Not Addressed

  • Using the Amazon Console to do web based deployments. I decided to do this with command line tools to get a feel for automation possibilities.
  • Limiting access to this service , access controls
  • IAM credentials.
  • Load testing with something like JMeter

SSH

SSH must installed and on your command line path. The Elastic Beanstalk command prompt expects ssh-keygen to be on your path. 

I did this work on Microsoft Windows 10 so I needed the windows tools. .  Microsoft is now contributing to this SSH distribution on github. I installed the 64 bit version in c:\Program Files\OpenSSH-Win64. Linux / Mac folks can use their favorite tools. 

Python and Pip

Python must be installed and on your command line path. Install Python and pip per the AWS Elastic Beanstalk CLI instructions. The page describes Windows and Linus installation p;rocesses. Make sure to add the Python directories to your environment PATH variables.

Elastic Beanstalk CLI

Install the Elastic Beanstalk CLI after installing Python. You can find it in the same CLI web page

Creating an EB Environment and Deploying Tika

Terms
  • Working directory: The name of the directory your command prompt is sitting in. This is the directory where .elasticbeanstalk/config.yml is created.
    • I usually prefix mine with my company name to to simplify uniqueness constraints later.
  • Application name:  This is the name of your directory by default.  It can be anything.  The application name is the root of the external URL and should be unique. 
    • I usually prefix mine with my company name to to simplify uniqueness constraints later.
  • Environment Name: Applications can be deployed into different environments with different properties. 
    • This tends to default to <app_name>_dev for development environments.
Steps
  1. Create a working directory.  This will probably be the same as your app name. 
    1. I named mine fsi-tika-eb for FreemanSoft Tika ElasticBeanstalk. I'd probably pick something more like fsi-eb-tika if I built up a demo enviornment in the future.
  2. CD into the directory.  The .elasticbeanstalk/config.yml file will end up here
  3. Download the tika-server jar file and put it into this directory. I used version 1.11   
  4. Initialize the eb command environment and answer its questions
    1. Run eb init
    2. Pick data center your company uses or the nearest data center.
    3. Enter the application id and secret.  I used my test account credentials.  You should use your IAM credentials.
    4. Enter your application name. It may default to your directory name.
    5. Select Java as the platform.
    6. Use Java 7 or Java 8
    7. Let it create the necessary SSH credentials.  
      1. This section fails if SSH is not on the path.
      2. Let it create a SSH key set with the default name if this is the first configured shell.  It selected aws-eb for my keyset name
  5. The config.yml file contains the settings selected during eb init.  Normally the eb create command would try run a command or deploy a directory  We have a single all-encompasing Tika jar that we downloaded above.  This means we can set the default deployment artifact to the jar file name.
    1. Edit .elasticbeanstalk/config.yml
    2. Add a new section
          deploy:
             artifact: tika-server-1.11.jar
  6. Create a new Elastic Beanstalk Environment and auto deploy the application.   We can choose the default options and reconfigure later or we can try and configure the load balancer port and machine size in a single command.
    1. Option: Single command line
      1. Create and deploy the application
        eb-crate --instance_type t2.micro --envvars PORT=9998
      2. Accept any defaults offered
    2. Option: Basic commands
      1. Create and deploy the application eb-create 
      2. Accept the defaults.
      3. Set the port number.  This causes the application to redeploy:
        eb setenv PORT=9998
You should end up with Tika deployed a single t2. micro instance deployed with auto-scale enabled up to 4 nodes.

Load Balancer Notes

The AWS Load Balancer listens on port 80 and assumes that the EB application is running on port 5000.  We have to change Load Balancer back side port or we have to change the port Tika is listening on.  It is easiest to just change the load balancer back side port since that can be done with just the PORT system property..

Verification

Basic Server Verification

Open a browser and do a GET request against your application.  The default naming is 
http://<application_name>_<environment>.elasticbeanstalk.com. 
My demo showed up on 
fsi-tika-eb.elasticbeanstalk.com
Executing a GET against the Tika parser
http://<application_name>_<environment>.elasticbeanstalk.com/tika
 My demo showed up on
http://fsi-tika-eb-dev.elasticbeanstalk.com/tika
and resulted in the message
This is Tika Server. Please PUT 

Parsing a Test Document

You can test the Tika server from any HTTP test tool like the Chrome POSTman plugin. The Tika Server API is documented on the Apache Tika Wiki.

Example File Parsing
  • Select PUT as your HTTP method
  • Set the Body type to Binary
  • Select the file that will act as a body.  
  • Select the MIME type you want back.
  • Use the /tika path, EX: http://fsi-tika-eb-dev.elasticbeanstalk.com/tika
  • Select an XLS files
  • Tell Tika you want HTML back with the header Accept: text/html
  • Submit the message. 
  • You should get get back an HTML table representation of the XLS file.

Example Data Type Detection
  • Select PUT as your HTTP method
  • Set the Body type to Binary
  • Select the file that will act as a body.  
  • Select the MIME type you want back.
  • Use the /detector/stream path EX: http://fsi-tika-eb-dev.elasticbeanstalk.com/detect/stream
  • Select a PNG file
  • Submit the message
  • You should get back the mime type of the document you sent.
See the Apache Tika Wiki page for more information.

Viewing Logs

You can view any of the captured log files using the eb command prompt. Open up a command prompt in your application's directory.  Enter the following to see logs
eb logs
 Deployment , access and application logs will be retrieved and displayed.

Additional Reading

Amazon publishes some free AWS kindle books/booklets including the following
  • http://www.amazon.com/AWS-Elastic-Beanstalk-Developer-Guide-ebook/dp/B007Q4JFE0

Created Feb 15 2016