Posts

Showing posts from February, 2016

Received and sent messages in a single mailbox with MS Outlook for OSX

Image
Microsoft Outlook for the Mac and PC behave differently when showing conversations in the Inbox. The PC shows received and sent messages. The Mac shows only the received messages.  There is no default way to show a threaded conversation on Mac Office 2016. Microsoft Outlook for the Mac is integrated with OS/X spotlight search so that AppleScript and Spotlight can be used to create Outlook  Smart Mail folders. Smart Folders  are more like views into mailboxes than actual mailboxes. They are virtual folders that are created from the results of a search.  This blog leverages Outlook's raw search  capabilities that come from OS/X integration.  You can find out more information about this integration on the Microsoft answers web site . Portions of this blog came from this excellent blog posting . Identify Mailboxes to be included in Smart Folder Our conversation SmartFolder is made up of the contents of the Inbox and Sent mailboxes. We first need to identify the Microsoft Outlook

Almost PaaS Document Parsing with Tika and AWS Elastic Beanstalk

Image
The Apache Tika project provides a  library capable of parsing and extracting data and meta data from over 1000 file types.  Tika is available as a single jar file that can be included inside applications or as a deployable jar file that runs Tika as a standalone service. This blog describes deploying the Tika jar as an auto-scale service in Amazon AWS Elastic Beanstalk.  I selected Elastic Beanstalk because it supports jar based deployments without any real Infrastructure configuration. Elastic Beanstalk auto-scale should take care of scaling up and down for for the number of requests you get. Tika parses documents and extracts their text completely in memory. Tika was deployed for this blog using EC2 t2.micro instances available in the AWS free tier. t2.micro VMs are 1GB which means that you are restricted in document complexity and size. You would size your instances appropriately for your largest documents.   Preconditions An AWS account. AWS access id and secret key.

Slice Splunk simpler and faster with better metadata

Image
Splunk is a powerful event log indexing and search tool that lets you analyze large amounts of data. Event and log streams can be fed to the Splunk engine where they are scanned and indexed.  Splunk supports full text search plus highly optimized searches against metadata and extracted data fields.  Extracted fields are outside this scope of this missive. Each log/event record consists of the log/event data itself and information about the log/event known as metadata.  For example, Splunk knows the originating host for each log/event.   Queries can efficiently filter by full or partial host names without having to specifically put the host name in every log message. Message counts with metadata wildcards One of the power features of metadata is that Splunk will provide a list of all metadata values and the number of matching messages as part of the result of any query.  A Splunk query returns matching log/event records and the the number of records in each bucket like #records