Data pipeline for vacation photos

I take pictures when I am on vacation. Then I throw away 90% of them, some make the cut to end up on Instagram. Instagram is a great platform, however without an official API to upload images, they make it tough for lazy amateurs to publish their content. There is a popular unofficial api, which I intend to give a try.

But even before I get there, I need to get the pipeline ready for getting the data out of the memory card of my DSLR and finding the ones that are worthy of posting. I don’t do any editing what so ever – and proudly tag everything with #nofilter. The real reason is image editing is tough and time-consuming. I doubt anyone would have such a workflow, but I find the existing tooling frustrating makes me do boring manual jobs – that too on vacation.

The workflow

Typically when I am on vacation, I would take pictures all day and as soon as I reach the hotel I want to get the pictures off my camera, group them in keep, discard,maybe buckets, upload them to the cloud, post them to Instagram and finally have the memory card cleaned up for the next day. If I get about an hour to do all this – I’m lucky. The most time-consuming part of the workflow is looking at the images and deciding which bucket it belongs. Rest of the parts are easy to automate. So I wanted to take up that part of the workflow first.

This stage of bucketing a few hundred images into keep,maybe,discardbuckets needed a tool that is more flexible than Photos on mac. Sometimes there are multiple shots of the same subject which needs to be compared next to each other.

After some digging, I found feh. It is a lightweight image browser and offers productivity and simplicity. Installing feh was fairly simple – just install it us if you are on a mac.

brew install feh

Feh acts like any other simple fullscreen image browser, 

feh -t # thumbnails 
feh -m # montage

Other useful keyboard shortcuts

/       # auto-zoom
Shift+> # rotate clockwise
Shift+< # rotate anti-clockwise

There are tonnes of other options and good enough mouse support as well.

Extending with custom scripts

However the real power is unleashed when you bind any arbitrary  unix commands to the number keys. For example:

mkdir keep maybe discard
feh --scale-down --auto-zoom --recursive --action "mv '%f' discard" --action1 "mv '%f' keep" --action2 "mv '%f' maybe" . &

Here is what is going on in the two commands above. First we create three directories. Next we bring up feh in the directory (.the current directory in this case) where we have copied the images from the memory card and use right left keys to cycle through the images.

The recursive flag takes care of going through any subdirectories. The scale-down and auto-zoom handles the sizing the images properly. The action flag allows you to associate arbitrary unix commands with keys 0-9. And that is incredible!

In the example above hitting the 0 key moves it to the directory discard. This is for two reasons – I am right handed and my workflow is to aggressively discard rather than keep. keep-s are less in numbers and easy to decide, so they are bound to 1. maybe-s are time sinks, so I bind it to 2.  I might do a few more passes on each folder before the keep bucket is finalized.

Taking it to the next level

But to take it to the next level, lets bind our 1 (keep) to aws s3 cp command. So we can instantly start uploading them to s3 with one keystroke.  Here’s the basic idea:

bucket=`date "+%Y-%m-%d-%s"`
aws s3 mb s3://${bucket}/keep --region us-west-1
aws s3 mv '%f' s3://${bucket}/'%f' &

Note the ampersand at the end of the command – this helps in putting the upload command in the background. That way the upload is not blocking and you can keep going through the images.

This is reasonably stable – even if feh crashes in the middle of your workflow, the upload commands are queued up and continue in the background.

Here is what the final command looks like. You can put this is a script and add it to your path for accessing quickly.

feh --scale-down --auto-zoom --recursive --action "aws s3 mv '%f' s3://${bucket}/'%f' &" --action1 "mv '%f' keep" --action2 "mv '%f' maybe" . &

This workflow is not for those who do a lot of editing with their pictures.  Overallfeh is fast to load images and provides a lot of extensibility.

Next Steps

The next step would be to configure the lambda function to the S3 upload event and have the unofficial instagram api post the image to instagram. One step remaining would be including the individual hashtags before S3upload. That way from memory card to instagram can be reduced to just a few keystrokes. 

Beyond that, I intend to move feh part of the pipeline to a raspberry pi. I can plug the raspberry pi to the TV of the hotel I am staying at and cut my computer from the loop. Here’s a short post I wrote up for setting up my raspberry pi with a TV. It will probably be a few weeks to get everything together. Till then enjoy a very reticent feed from my instagram .


A few pictures from my first Italy trip. It most certainly, would not be the last. The mesmerizing beauty of the land, the layers of history at every corner, the warmth of people and the sumptuous delicacies – makes you fall in love with the country the very moment you lay your first foot in the country. Hope to be back soon!

ChiPy Python Mentorship Dinner March 2015

Chicago Python Users group Mentorship program for 2015 is officially live! It is a three month long program, where we pair up a new Pythonista with an experienced one to help them improve as developers. Encouraged by the success of last year, we decided to do it in a grander scale this time. Last night ChiPy and Computer Futures hosted a dinner for the mentors at Giordano’s Pizzeria to celebrate the kick off – deep dish, Chicago style!

The Match Making:

Thanks to the brilliant work by the mentor and mentees from 2014, we got a massive response as soon as we opened the registration process this year. While the number of mentee applications grew rapidly, we were unable to get enough mentors and had to limit the mentee applications to 30. Of them, 8 were Python beginners, 5 were interested in web development, 13 in Data Science, and rest in Advanced Python. After some interwebs lobbying, some  arm twisting mafia tactics, we finally managed to get 19 mentees hooked up with their mentors. 

Based on my previous experience at pairing mentor and mentees, the relationship works out only if there is a common theme of interest between the two. To make the matching process easier, I focused on getting a full-text description of their background & end goals as well as their LinkedIn data. From what I heard last night from the mentors, the matches have clicked!

The Mentors’ Dinner:
As ChiPy organizers, we are incredibly grateful to these 19 mentors, who are devoting their time to help the Python community in Chicago. Last night’s dinner was a humble note of thanks to them. Set in the relaxed atmosphere of the pizzeria, stuffed with pizza and beer, it gave us an opportunity to talk and discuss how we can make the process more effective for both the mentor and mentees

Trading of ideas and skills:
The one-to-one relationship of the mentor and mentee gives the mentee enough comfort for saying – “I don’t get it, please help!”. It takes away the fear of being judged, which is a problem in a traditional classroom type learning. But to be fair to the mentor, it is impossible for him/her to be master of everything Python and beyond. That is why we need to trade ideas and skills. Last time when one of the mentor/mentee pairs needed some help designing an RDBMS schema, one of the other mentors stepped in and helped them complete it much faster. Facilitating such collaboration brings out the best resources in the community. Keeping these in mind we have decided to use ChiPy’s discussion threads to keep track of the progress of our mentor and mentee pairs. Here is the first thread introducing what the mentor and mentee are working on.

Some other points that came out of last night’s discussion:

  • We were not able to find mentors for our Advanced Python track. Based on the feedback we decided to rebrand it to Python Performance Optimization for next time.
  • Each mentor/mentee pair will be creating their own curriculum. Having a centralized repository of those will make them reusable
  • Reaching out to Python shops in Chicago for mentors. The benefit of this is far reaching. If a company volunteers their experienced developers as mentors, it could serve like a free apprenticeship program and pave the way in recruiting interns, contractors and full time hires. Hat-tip to Catherine for this idea.

Lastly, I want to thank our sponsor – Computer Futures, for being such a gracious hosts. They are focused on helping Pythonistas find the best Python job that are out there. Thanks for seeing the value in what we are doing and hope we can continue to work together to help the Python community in Chicago. 

If you are interested in learning more about being a mentor or a mentee, feel free to reach out to me. Join ChiPy’s community to learn more about what’s next for the mentor and mentees. 

Chicago Python User Group Mentorship Program

 If you stay in Chicago, have some interest in programming – you must have heard about the Chicago Python Users Group or Chipy. Founded by Brian Ray, it is one of the oldest tech group in the city and is a vibrant community that welcomes programmers of all skill levels. We meet on the second Thursday of every month at a new venue with some awesome talks, great food and a lot of enthusiasm about our favorite programming language. Other than talks on various language features and libraries, we have had language shootouts (putting Python on the line with other languages), programming puzzle night etc.

Chipy meetups are great to learn about new things and meet a lot of very smart people. Beginning this October, we are doing a one on one, three month mentorship program. Its completely free, and totally driven by the community. By building this one to one relationships through the mentorship program, we are trying to build a stronger community of Pythonistas in Chicago.

We have kept it open on how the M&M pairs want to interact, but as an overall goal we wanted the mentors to help the mentees with the following:

1. Selection of a list of topics that is doable in this time frame (October 2014 – January 2014)
2. Help the mentee with resources (pair programming, tools, articles, books etc) when they are stuck
3. Encourage the mentee to do more hands on coding and share their work publicly
It has been really amazing to see the level of enthusiasm among the M&M-s. I have been fortunate to play the role of a match maker – where I look into the background, level of expertise, topics of interests and availability for all M&M-s and try to find out an ideal pair. I’ve been collecting data at every juncture so that we can improve the program in later iterations.

Here are some aggregated data points till now:

# of mentors signed up: 15
# of mentees new to programming: 2
# of mentees new to Python: 16
# of mentee-s Advanced Python: 5
Total: 37

# of mentors with a mentee: 13
# of mentees new to programming with an assigned mentor:1
# of mentees new to Python with an assigned mentor:11
# of mentees with Advanced Python with an assigned mentor:1
# of mentors for newbie mentees without an assignment: 2
# of mentees unreachable: 4
# of mentees new to programming without an assigned mentor:1 (unreachable)
# of mentees new to Python without an assigned mentor:2 (unreachable)
# of mentees with Advanced Python without an assigned mentor:4 (1 unreachable, 3 no advanced mentors)

Other points:
– Data analysis is the most common area of interest.
– # of female developers: 6
– # of students: 2 (1 high-school, 1 grad student)

All M&M pairs are currently busy figuring out what they want to achieve in the next three months and preparing a schedule. Advanced mentees, are forming a focused hack group to peer coach on advanced topics.
We are incredibly grateful to the mentors for their time and the enthusiasm that the mentees have shown for the program. While this year’s mentoring program is completely full, if you are interested in getting mentored in Python, check back in December. Similarly, if you want to mentor someone with your Python knowledge, please let me know. If you have any tips you would want to share on mentoring, being a smart mentee – please leave them in the comments – I’ll share them with the mentor and mentees. And lastly, please feel free to leave any suggestions on what I can do to make the program beneficial for everyone.

Data loss protection for source code

Scopes of Data loss in SDLC
In a post Wikileaks age the software engineering companies should probably start sniffing their development artifacts to protect the customer’s interest. From requirement analysis document to the source code and beyond, different the software artifacts contain information that the clients will consider sensitive. The traditional development process has multiple points for potential data loss – external testing agencies, other software vendors, consulting agencies etc. Most software companies have security experts and/or business analysts redacting sensitive information from documents written in natural language. Source code is a bit different though.

A lot companies do have people looking into the source code for trademark infringements, copyright statements that do not adhere to established patterns, checking if previous copyright/credits are maintained, when applicable. Blackduck or, Coverity are nice tools to help you with that.

Ambitious goal

I am trying to do a study on data loss protection in source code – sensitive information or and quasi-identifiers that might have seeped into the code in the form of comments, variable names etc. The ambitious goal is detection of such leaks and automatically sanitize (probably replace all is enough) such source code and retain code comprehensibility at the same time.

To formulate a convincing case study with motivating examples I need to mine considerable code base and requirement specifications. But no software company would actually give you access to such artifacts. Moreover (academic) people who would evaluate the study are also expected to be lacking such facilities for reproducibility. So we turn towards Free/Open source softwares., Github, Bitbucket, Google code – huge archives of robust softwares written by sharpest minds all over the globe. However there are two significant issues with using FOSS for such a study.

Sensitive information in FOSS code?

Firstly, what can be confidential in open source code? Majority of FOSS projects develop and thrive outside the corporate firewalls with out the need for hiding anything. So we might be looking for the needle in the wrong haystack. However, being able to define WHAT sensitive information is we can probably get around with it.

There are commercial products like Identity Finder that detect information like Social Security Numbers (SSNs), Credit/Debit Card Information (CCNs), Bank Account Information, any Custom Pattern or Sensitive Data in documents. Some more regex foo or should be good enough for detecting all such stuff …

for i in `cat sensitive_terms_list.txt`;do
        for j in `ls $SRC_DIR`; do cat $SRC_DIR$j | grep -EHn --color=always $i ; done

Documentation in FOSS

Secondly, the ‘release early, release often’ bits of FOSS make a structured software development model somewhat redundant. Who would want to write requirements docs, design docs when you just want to scratch the itch? The nearest in terms of design or, specification documentation would be projects which have adopted the Agile model (or, Scrum, say) of development. In other words, a model that mandates extensive requirements documentation be drawn up in the form of user stories and their ilk. being a trivial example.

Still Looking
What are some of the famous Free/Open Source projects that have considerable documentation closely resembling a traditional development model (or models accepted in closed source development)? I plan to build a catalog of such software projects so that it can serve as a reference for similar work that involve traceability in source code and requirements.

Possible places to look into: (WIP)
* Repositories mentioned above

Would sincerely appreciate if you leave your thoughts, comments, poison fangs in the comments section … 🙂

Pesky tasks with batch scripts

Scripting is an art. Nifty and subtle, wicked cool scripts can weave magic, and startle compiled languages. When it comes to getting yet-another-pesky-job done, that scripting languages are your friend.

The batch scripting language, is one of the ways Windows operating system offers for writing small scripts without the need of installing any additional language support. It is somewhat limited with multiple short comings that does not make it fun. However you can still get some interesting stuff done with it. Below are some pesky jobs that can still be done with batch scripts.

Pesky job 1 : Map a network drive

net use N:| find “OK”
if errorlevel 1 net use N: \servernamepath$ ******** /user:******* /persistent:yes

This will check if the drive N is mapped or not; in case there is an error, it will map servernamepath with proper username/password values and keep this map persistent across reboots.

Pesky job 2 : Copying files with a time stamp
Say we want to copy a few files from one directory to another file to another with the current date stamp, it could be a simple
copy help.txt Desktop%date:~10,4%%date:~7,2%%date:~4,2%-chgs-1.txt

Truly ugly? Quite right.

Normally the date command would output

C:Documents and SettingsTatha>date
The current date is: Mon 11/17/2008
Enter the new date: (mm-dd-yy)

To use the date-stamp say in an echo statement, put the command with in percentage signs. to extract part of the time stamp, the command should be followed with a “:~offset, number_of_characters”. For example

C:Documents and SettingsTatha>echo %date:~0,14%
Mon 11/17/2008

So, the copy command above would create a copy the help.txt to the path C:Documents and SettingsTathaDesktop with a name 20081711-chgs-1.txt, on 17th November 2008.

But wait, this wont work in a Windows NT box. Seems like the automatic variables DATE and TIME were not implemented until windows 2000, so if you want a time stamp in an NT box you should

time /t >> file.txt

Pesky job 3 : Starting and stopping windows services gracefully
Another glitch when running newer bat scripts in Windows NT, that I came across is controlling Windows services. Consider the following snippet to stop a service named SomeAppServer or someappserver in a Windows Xp box.

net start | find “SomeAppServer”
if errorlevel 1 goto STOPPED
if errorlevel 0 echo %date% %time% Attempting to Stop SomeAppServer >> log.txt
start /wait net stop “SomeAppServer” >> log.txt 2>&1
if errorlevel 1 echo %date% %time% SomeAppServer could not be stopped >>log.txt
echo %date% %time% SomeAppServer is stopped >> log.txt
echo — >> log.txt

However, in case the name of the service is someappserver, instead of SomeAppServer as written in the script, it would fail to stop the service in a Windows NT box. NT treats the service names as case sensitive and you need to supply exactly as it is listed.

Here are some good resources for batch scripting

Moving on

Life has not been that interesting to produce further gibberish adage for the last few months. At work I’m looking into a plethora of antediluvian technologies – but still putting up to learn the new ones.

My white paper titled Security Concerns with Web Services was warmly appreciated and got published our internal knowledge net. Though, I cannot publish it anywhere else … I surely can share the helpful tools that I used to detect web service vulnerabilities.

With the tools listed below, some imaginations and a desire to have fun – you can really have a good idea about web services security.

Tools for studying Web Services Security

  • WebGoat is an insecure J2EE application that provides a number of lessons for practicing commonly known security exploits.
  • Soap UI is a popular SOA and Web Services testing tool with a number offeatures like web service client code generation, mock serviceimplementation, and groovy scripting.
  • WS Fuzzer is a fuzzing penetration testing tool used against HTTP SOAP based web services. It tests numerous aspects (input validation, XML Parser, etc) of the SOAP target.
  • WebScarab is a framework for analysing applications that communicate using the HTTP and HTTPS protocols.
  • LiveHTTPHeader is a mozilla plugin that provides all the information about the browser traffic.
  • Cryptcat is a lightweight version of netcat with integrated transport encryption capabilities.
  • Fiddler is a HTTP Debugging Proxy which logs all HTTP traffic between your computer and the Internet. Fiddler allows you to inspect all HTTP Traffic, set breakpoints, and “fiddle” with incoming or outgoing data.
  • TcpMon is a utility that allows the user to monitor the messages passed along in TCP based conversation.
  • cURL is a tool to transfer data from or to a server, using one of the supported protocols (HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP, DICT, TELNET, LDAP or FILE). The command is designed to work without user interaction.

Most of the above tools comes with neat documentation, so have fun!

On loss and new beginning

“How does it feel
How does it feel

To be on your own

With no direction home

Like a complete unknown

Like a rolling stone?”

I lost it. I lost it all.
Three years of electronic ranting, tales of code, help, pride, use, abuse, love, hate, lies, videos, pdfs, – fuck, the list is endless! It surely justifies taking a sick leave …
Andrew Grove says Only the paranoid survives. But he never says getting hyper-paranoid for survival. Well, no regrets brother – just lessons.
If you happen to have no clue which loss I’m talking about – you hardly know me. Its my google account – I forgot the password for it. The big G is the spinal cord of your online existence – once you snap from it your gmail, blog, orkut, notebook, reader, docs everything refuses you as if you are some sort of a beguiler trying to steal the free services and be the next spam superstar!

Every loss makes you wiser. Its like a tool that refreshes the the old, and paves the way for the new change. So …

Turn the clock to zero, boss
The river’s wide, we’ll swim across
Started up a brand new day

It could happen to you – just like it happened to me
There’s simply no immunity – there’s no guarantee
I say love’s such a force – if you find yourself in it
And sometimes no reflection is there“