# Why use weight rather than volume?

When we bake at home, we generally use volume measurements. It’s the way mom makes cookies, the way grandma makes biscuits; it’s just generally what is done. Well, let’s take a look at why professional chefs and bakers don’t do that.

When I walked into baking class on the first day and saw a scale sitting on each table, I didn’t think much of it; I just assumed it was for dough or finished products (I don’t know dude, I was new.) I soon learned that the best way to have accurate and consistent results is to actually weigh each and every ingredients, we call this “scaling.” Let me explain.

OK, so let’s say that you are five years old and you and your mom, Helen, are baking some cookies at home. You get the flour, eggs, and all the other what-not from the fridge. You use a measuring cup and scale out one cup of sugar, your  an ambitious kid and want to impress Helen, so you pack the flour down in that measuring cup as hard as your cute little hands will let you. Cool, your cookies come out just right.

Flash forward, your 17 and your mom is forcing you to bake some cookies for the church bake sale. You’re not really stoked about this. Obviously the guys from Rise Against don’t make cookies with their mom. So, you want to get this over with as soon as possible and you are grabbing whatever it is that she is screaming for you to get from the fridge. You grab the measuring cup, scrape up a cup of sugar as fast as you can and throw it in the bowl. You totally measured out a cup; you actually kind of looked at the little red line that is fading off the side of the Tupperware measuring cup that your grandma gave your mom for Christmas when you were like, three.

Do you see where I’m going with this? The first time, you jammed the sugar down in the measuring cup as hard as you can. The second time you just used whatever was lightly sitting in the measuring cup. The sugar was piled up to the same line, yet you still got two totally different amounts, which means two totally different results. Now, with sugar at home for your moms chocolate chip cookies, it’s not such a big deal. However, when it comes to making bread or something of that nature, and your measuring out something very sensitive, such as yeast… now we are talking about different chemical reactions. Baking is a science. Bakers even go so far as referring to their recipes as “formulas.” Crazy, I know.

I’ll give you a way to really see this fact in action: rough chop a carrot into big chunks, and then place it into a clear measuring cup and look at what measurement it reads. Now dump all that carrot out and chop it up really fine. Go ahead and measure the carrot again. I’ll wait.

Done? Ok, so you can see the difference. You have the same carrot, same amount of carrot, and same measuring cup, but you have two totally different measurement. The same applies to baking; you can measure the same batch of flour with the same measuring cup, but your results will never be exactly the same because you won’t always use the same method of retrieving the flour. Sometimes you may push it down with your hand, sometimes you may scrape it off the side of the bin that the flour is in, etc.

With all this in mind, you should now have a general idea of why weight is the best option for measuring out the ingredients of baked goods. The answer is consistency. No matter what, 5 ounces is 5 ounces, regardless of how mad you are at your mom or how much of an eager beaver you are.

Now, go bake something.

-Daniel Leatherman

Student, IUPACA

# Analytics: Reached mysql limit? Let’s hive

TL;DR: Step by step guide on how to replicate your Mysql database to Apache Hive (Sql on hadoop) to run offline jobs to generate reports/insights

Background:

In every startup, as time goes by the database size keeps increasing and daily cron jobs to calculate daily reports and do offline tasks keep getting slower and slower until one day you raise your hand and say - this is f**ked up and I better fix this shit.

You open your cron jobs and it’s filled with complex joins and running them on mysql is going to take hours everyday which might be fine today but when you are growing 2x every month, you will again hit the limit soon.

Mysql Sharding-Memcache combo works great for front-end. For backend analytics, you need a specialist. His name is Hive. Apache Hive. From wikipedia:

Apache Hive is a data warehouse infrastructure built on top of Hadoop.

Think of this, all your GBs or TBs of data is saved in Hadoop. Even though the data is saved in text files, to you it will appear as a database composed of multiple tables on which you can run as complex queries as you want. Want to take a join of 3 tables of 1 GB each, no problem. Queries are taking longer than you want them to take, add ten more servers to the cluster (horizontal scaling).

In short, Hive takes your sql queries, converts into map-reduce jobs, runs them and gives you the final answer. Remember Hive is for offline processing only as even a simple query will take few seconds.

This is exactly what we needed :).

Architecture:

In our current architecture, we have cron jobs which run daily, query mysql for data and generate the reports which are sent to clients and analysts.

In our new architecture, we will have all our data in both hive and mysql. So the cron jobs can query mysql for simple queries like which reports need to be generated but for complex queries, it will query hive.

Step 1 - Installation:

To keep the guide simple, lets assume your cluster consists of only one server. So lets get started:

i) Install Hadoop and all 5 daemon packages (Hadoop Installation - Cloudera)

ii) Install hive and set hive to use mysql as metastore (Hive Installation - Cloudera)

iii) chmod g+w /user/hive/warehouse (this is where hive is going save all your data)

iv) Define the following environment variables

HIVE_HOME=”/usr/lib/hive/”
HIVE_PORT=10000

Type “hive” on terminal and you will get a hive prompt. That means everything went well.

Step 2 - Let’s Hive:

To learn to play around with hive lets insert the movie lens dataset into hive and run some queries. (Source : Apache Hive - Getting Started)

i) First, download and extract the movie lens data set:

\$ wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
\$ tar xvzf ml-data.tar+0.gz

ii) Create a table with tab-delimited text file format:

hive> `CREATE TABLE u_data ( userid INT, movieid INT, rating INT, unixtime STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;`

iii) And load it into the table that was just created:

hive> `LOAD DATA LOCAL INPATH 'ml-data/u.data' OVERWRITE INTO TABLE u_data;`

iv) Count the number of rows in table u_data:

hive> `SELECT COUNT(*) FROM u_data;`

You might want to do try some complex queries like Join, Group By etc.

Step 3 - Where’s my data?

We have all our data in mysql. To copy complete data into hive we will use Sqoop by Cloudera (These guys rock!).

i) Lets first install sqoop (Sqoop Installation - Cloudera) ii) Now you can copy everything to hive by one simple command:

`\$ sqoop import-all-tables --hive-import --connect jdbc:mysql://mysqlserver/databasename --username mysqluser`

iii) But let’s write a script to copy only the tables we need and do it one by one. The script will be:

`i) Get list of tables from mysql (hint: show tables) ii) For each table, check if you want to not copy this table and skip it iii) Drop the table on hive if it exists (hint: hive -e) iv) Copy the table data to sqoop (hint: use sqoop import) iv) Insert this script in crontab and execute it once everyday`

Note that we are copying everything to hive from scratch. This is in-efficient. There must be a better way. Yes, there is!

Step 4 - Thou shall copy only what is updated

Instead of copying everything, we will only copy the rows which have been updated. To each of the tables in mysql add a “updated_at” column (timestamp) which is updated each time the row is updated and create an index on it. We will now save the timestamp each time we import data and then next time only copy the rows which have been updated after that timestamp. This is made even easier by using sqoop-jobs. Read more about sqoop-jobs here. (Tip: always have a created_at and updated_at in all tables. You will need it so many times for debugging and specially when a bug is pushed to production)

We will now have two scripts. One called hiveFirstRun which will create all jobs and copy everything from scratch. Second script called hiveUpdate which re-run the jobs and copy only the updated data. Lets write hiveFirstRun first:

`For each table we want to copy i) Delete job and drop hive table ii) Create empty table (hint: use sqoop create-hive-table) iii) Create job which takes the data from mysql and writes to a file (hint: use "--incremental lastmodified --check-column updated_at") iv) Execute job and write data to temp file v) Make a copy of temp file to "tableName_lastDump" vi) Load the temp file into hive table (hint: LOAD DATA INPATH) vii) Generate Java class and source file for each table which we will be using in merge (hint: use sqoop codegen) `

You might be thinking why we wrote to a file first and then copied to hive. We will be using “sqoop merge” to merge the new data with older data which operates on files afaik. Also, we created a copy of file in step (v) because load data will move the file into hadoop and we want to keep a copy to use in next merge. To merge the data, we will be using primary key of each table (id column). Also, you need to set “sqoop.metastore.client.record.password” to true in sqoop config file to save mysql password for each job. Lets write hiveUpdate now:

`For each table we want to copy i) Execute job and write the data to "updated_data" file ii) Run "sqoop merge" to merge "tableName_lastDump" and "updated_data", and write the merged data to temp file (hint: you will need the java files created in (vii) of hiveFirstRun) iii) Make a copy of temp file and overwrite "tableName_lastDump" iv) Load the temp file into hive`

Add hiveUpdate to your crontab and execute it every night.

Step 5 - Not so corner cases

i) When your database schema changes, updated_at is not changed

In hiveFirstRun, I save the schema of each table (hint: use mysql desc) and in hiveUpdate I check if schema is same. If it’s not same, I run hiveFirstRun for that table again which regenerates it from scratch. A better approach would be execute “Alter table” on hive also when it’s updated on mysql. Planning to incorporate this into our database schema versioning system soon

ii) While creating table in hive, use “mysql-delimiters” option. Also, before loading the file into hive, you will have rewrite the file and escape commas. I plan to look into this more and find a better solution

iii) In hive, I had created few de-normalized tables which were joins of multiple tables. These table were used by multiple reports so it made sense to pre-compute them instead of each report computing it

Final Words:

So now we have all our data in hive which is updated daily and at the end of day. But how do we use this data. There are two approaches and we are actually using both:

i) To run hive queries from your code, use hive clients which uses thrift. Code samples for all major languages available here

ii) For manual one time queries - install Cloudera Hue. It gives a nice web interface to run queries and export results in csv or excel file. Think of hue as phpmyadmin for Hive.

If you have questions or something is not clear, post a comment here or search on google, stackoverflow and cloudera forums. Almost every issue I ran into, somebody else had already encountered and posted a solution.

p.s. If you love numbers and writing algorithms to play with numbers, drop me an email at himanshu.baweja@gmail.com . I will be more than happy to refer you to Facebook (which initially developed Hive) and/or to start-up for which I built the above system as a consultant. Both are growing like crazy and need people like you :).

# Scaling without Anti-Aliasing in Photoshop

Hey guys. Since some of you have asked me for pixel-art pointers in the past, I thought I’d make this quick tutorial about something that I only just googled last night. If you’re looking for a general pixel art tutorial, you can find one here. This might be pretty long and boring! So think hard before you read it!

Also, it might be a good idea to add that YOU SHOULD NEVER SCALE PIXEL ART. PLAN AHEAD. But sometimes I draw something, and then I’m like “dang. I really need this to take up more space in this composition.” The best way to prevent that situation is to do a lot of sketching, but even then, things can change when you get into your final piece. For example, say I want this guy to take up the whole frame:

That’s a lot to redraw. Whatever! I’ll just scale him up!

Hmm, that looks kind of good! but he’s a little fuzzy. When we zoom in, we notice that we lost all of our nice, clean pixels because of the way Photoshop interpolates the images when you scale (it pulls colors from pixels surrounding an area that needs to be filled in and uses the average, so you lose that crispness — not to mention increasing your file size quite a bit).

Well, luckily photoshop lets you change the settings for that image interpolation. If you open up your general preferences (edit -> preferences -> general on a PC, photoshop -> preferences -> general on a mac) you can change your interpolation settings. Select this dropdown and change your settings from bicubic to nearest neighbor.

What this does is basically says “instead of pulling and averaging pixels, let’s only use the exact colors we’ve defined and fill in the area that matches the shape as closely as possible.” Great! Let’s Scale up!

There we go! That looks… wait that looks super weird. Let’s check it out up close.

I spent so much time lovingly crafting that shirt! What happened to all of my beautiful squares?! Well, when you’re using nearest neighbor, Photoshop kind of has to guess and doesn’t always have room to fit things in where they need to go. The GOOD news is, if you scale by exact increments of hundreds of percents (200%, 300%, etc.), the scale will be perfect. The other kind of good news is, scaling down seems to work way better than scaling up. So we’ll scale up to 200%:

Looks PERFECT! but it’s too big. So let’s scale down a little bit:

Still not perfect, but if we compare our results side by side, it looks like scaling down worked much better.

There’s still some pretty gross discrepancy in our pixel aspect ratios, but hey, maybe you’re in a hurry? Or maybe you’d rather go in and clean your image up rather than redrawing it completely. I don’t know how possible that would be with this method, but figure it out yourself! It’s your fault for not doing proper planning! Another option which is less than perfect is to scale way up using nearest neighbor, and then scale down with bicubic settings. Your pixels will retain their shape, and you’ll get a little less antialiasing because it’s easier for photoshop to remove pixels than figure out how to fill them in. But it’ll still definitely be a little fuzzy. If you’re scaling a whole image, you can just select “nearest neighbor” in the resize image window. If you’re doing one element, you need to do it this way. Also, never only scale one element because mismatching pixel resolutions are THE WORST.

Anyway, maybe that information is useful to you? Probably not! Whatever! Go make some pixel art! And if anyone has more to say on this topic, let me know!!

# ESDS Announces the Launch of eNlight

amplify.com

ESDS is pleased to announce the launch of its eNlight Cloud Computing Platform - the World’s first intelligent cloud that truly does justice to the concept of Cloud Computing. eNlight Cloud is an addition to the company’s existing portfolio of other software products and managed hosting services and was designed with small to medium sized companies in mind. In the existing Cloud Hosting market most companies offer the option to pay for fixed use. Companies try to market it as flexible hosting by claiming you can pay her hour, but the vision behind eNlight was to take this much further. eNlight allows companies to pay for real time usage and therefore justifies the concept of Cloud.

# Does Ultrasonic Teeth Cleaning or Scaling work?

To provide patients a more thorough dental cleaning in comparison to manual teeth cleaning by hand, a latest and painless method has emerged by the name of Ultrasonic dental cleaning.
Dental Practice Marketing http://dentistidentity.com/

Amplify’d from www.drchetan.com

See this Amp at http://amplify.com/u/a15djv

# Scaling The Management Team

I found the comment on this post is very interesting for me.

Among the reasons I find this difficult for first time CEOs are:
a) They have the mistaken belief that their job is to be the “boss” and to know “better” than those they hire and or team with. It’s a perverse belief based, I think, in guilt and anxiety: I’m higher up (“What,” I often say, “the food chain?”) and so I should be able to do the job better.
b) The reason the person is CEO is often because they are the founder and, at one time, they did nearly everything. Giving up and delegating (even to people they recognize are better skilled) is scary and destabilizing. They wonder, well what’s MY job then?
c) Fear of the loss of control is often an obstacle but it’s NEVER present alone. I find it’s often accompanied by a fear of the loss of purpose (related to the second point).
d) The team that got you to a place where you are now in need of scaling the management team very often consists of friends who were crazy enough to believe your idea in the first place (and equally often were unemployed). They may not be scaling their skills or maturity as quickly as the company needs and so you get the problem of having to reshuffle (or, even worse, fire) co-founders/friends.

For those who may not know, there are several key milestones when starting a company.

1. You have a dream, a vision, something whatsoever that you think it will work out to gain some money and help to solve some problems of people around you and you want to stick with it in next a few years.

2. You share with friends, then find out your co-founder, set-up a team. Everyone gets exicted and *hack* things done (get things done is not enough to start a business, that’s why MBAs have bigger chances to be failed ). You got some small fund from friend/family or you self-funding by doing part-time job (like I did)

3. You/your team fails and you realize that your ideas just not work out the way you want. And this is the dip, and the question is yes or no, do we continue. At DeltaViet, we even having presentation about this book, on how should we give up and should we continue. Most of wanna-preneurs give up in this stage, too much to be lost to be continue.

4. Then you face the truth that your original idea is not that good, it’s not the next big things people are talking about. Yes, it’s a little bit bitty. But the important thing is you think about PIVOT, and just fucking pivot since there is no other way.

5. You have the first customer to pay you to help solving their problems. It’s like the dream come true.

6. You have the first employee, for us (DeltaViet), our first employees are internship students who we spent a lot of time training/coaching them to hack things done just like we did.

7. You little company grows, and now you may think it will be the next Google, next Yahoo, next Amazon, next Facebook, next whatsofuckingever that’s soon very big. And you meet big investors, maybe VCs, talking, doing elevator pitching and a lot of presentation about your plan to be next big things. And the fact that there is only 1/300 chance that you’re funded by a VC.

And believe me or not, raising fund from VC is not about your business plan, your dream or whatsoever you may think of. Raising fund from VC is more less like flirting your girl friend, VC investment is like marriage, then it’s about truth, about lust, and about relationship and it’s not one night stand. There are 3 types of love according to Richard Wiseman:

• Eros: These lovers have very strong ideas about the type of physical and psychological characteristics that they desire in a partner.
• Storge: These lovers value trust over lust. Instead of having a perfect partner in mind, they slowly develop a network of friends in the hope that affection will transform into deep commitment and love.
• Ludus: These lovers have no ideal type in mind but are instead happy to play the field.

Then, we have these types of love between your company and VCs.

• The VC may find your company as the ideal type of company they want to fund, and you are doing ideal things they want to see, battling the ideal segment that they want to join.
• The VC may in love with your company after a long time see your development with a lot of deep understanding what you are doing.
• The VC can be are risk takers whose loving style is often driven by a fear of being abandoned by a partner, they fear that you may someday become the next Facebook, so better get some piece of the pie. That’s why many times VCs invest in some bullshit companies that going nowhere. You maybe a lucky badass.

and there is case that you doesn’t love the VC and just don’t want to give them a fuck(which rarely happen just like a guy hardly to deny a girl)

8.  Funded or not, you may have your company grows, get more customers to pay the bills or expand. Then this is the case, you may *someday* suddenly find that you are doing something wrong.

You are a wartime CEO, you can lead your team to work super crazy to win a battle, but you’re doing all wrong when peace comes, you now may be fucking badass who want everything under your control, everything! You still want to do everything or monitor everything, you fear of losing your control. You deny all the ideas that you think it will not work out since you had experience not work it out in the past. That’s my case.

I have long time to find out that what I’m doing wrong, and hard time to convince myself that I’m doing wrong, then it’s even harder to convince my team that actually what I’m doing is wrong. They believed in me. That makes me sad, for long. It took me days & months staring at the ceiling every night.

But luckily, we found the new CEO, who I believe that is doing a good job. Yes, no one is perfect, and no one is perfect as you want, and no one will do all the things the way you want (and that’s why people are there for). That’s the way it should be, that we are trying to build something sustainable and strive for the long run.

Whatever excites you, go for it. Whatfuckingever scares you, go fucking do it.

# Keep it Simple, Get it Right: Scale

jperla.com

Keep it simple, Get it right, Don’t hide power, Use procedure arguments to provide flexibility in an interface, Leave it to the client, Continuity Plan to throw one away, Keep secrets of the implementation, Use a good idea again instead of generalizing it

Handle normal and worst cases separately as a rule

Split resources in a fixed way if in doubt, Use static analysis if you can, Dynamic translation from a convenient representation to one that can be quickly interpreted, Cache answers to expensive computations, When in doubt use brute force, Compute in background when possible, Use Batch Processing if possible, Shed load to control demand

Just the most important things, of course.

# “In the wonderfully titled paper Scaling of Differentiation in Networks: Nervous Systems, Organisms, Ant Colonies, Ecosystems, Businesses, Universities, Cities, Electronic Circuits, and Legos, Mark Changizi and his colleagues set out to understand this concept. They found that in every single one of the systems in the wildly interdisciplinary list of the subtitle there was an increase in the number of types of components as the total number of pieces grew. The larger something is, the more types of building blocks it uses.”

The Mathematics of Lego | Wired Science | Wired.com

Scaling of Differentiation in Networks: Nervous Systems, Organisms, Ant Colonies, Ecosystems, Businesses, Universities, Cities, Electronic Circuits, and Legos

I’m mulling over whether Anthemis as a lego set of financial pieces is a good metaphor…