databases

Goshhh! Can’t tell you how much I love my study space now. It’s so much more chill :)

ELI5: How do hackers find/gain 'backdoor' access to websites, databases etc.?

Gunna try doing this like ELI10. Back door access is just a way of saying “not-expected"access. Sometimes its still done through the front door, and sometimes its through a window.

Something like the front door would be if your Mom told you you could have one glass of coke, and you went and got the big glass flower vase, and poured 6 cokes into it. By following the rules in an unexpected way, you’ve tricked the machine. When mom asks you later how many glasses of coke you had, (of course with her trusty polygraph), you can truthfully answer, "One”. This might be like an SQL injection. Instead of answering 5+8=__ with “14”, you might answer with “14&OUTPUT_FINAL_ANSWER_LIST”. Since it has no spaces and starts with numbers, it might satisfy the rules.

Another way would be if your Mom said you could invite some friends over to play. After the 5th friend walks in, your Mom declares, “That’s it, not another kid walks through that door!” If you open a window and let Johnny climb in with his crayons, technically you didn’t break the rules (for the eventual polygraph) AND when you and your 5 friends go downstairs for homework, Johnny can color all over the walls without someone suspecting he’s there. This is as though you made new login names and used one of the names to give another person administrative, or Mommy, rights. Sometimes you need to make a new login screen, or just knock open a hole in the wall and cover it with a poster, but the idea is still to break the intention of the rules while following them to the letter.

What’s also important to remember is this goes very smoothly when someone lives in the house already, but becomes much harder when you’re trying to get into a stranger’s house. You might have to try to sell them cookies or magazines and then write down where the windows are. Or you might have to offer to clean their whole house for only \$5, and then leave a window unlocked for your friend to come back later. Getting inside is a major step.

Scientific Search Engines and Databases

The scientific community keeps many databases that can provide a huge amount of information but may not show up in searches through an ordinary search engine. Check these out to see if you can find what you need to know.

• Science.gov. This search engine offers specific categories including agriculture and food, biology and nature, Earth and ocean sciences, health and medicine, and more.
• WorldWideScience.org. Search for science information with this connection to international science databases and portals.
• CiteSeer.IST. This search engine and digital library will help you find information within scientific literature.
• Scirus has a pure scientific focus. It is a far reaching research engine that can scour journals, scientists’ homepages, courseware, pre-print server material, patents and institutional intranets.
• Scopus. Find academic information among science, technology, medicine, and social science categories.
• GoPubMed. Search for biomedical texts with this search engine that accesses PubMed articles.
• the Gene Ontology. Search the Gene Ontology database for genes, proteins, or Gene Ontology terms.
• PubFocus. This search engine searches Medline and PubMed for information on articles, authors, and publishing trends.
• Scitation. Find over one million scientific papers from journals, conferences, magazines, and other sources with this tool.
General Search Engines and Databases

These databases and search engines for databases will provide information from places on the Internet most typical search engines cannot.

• DeepDyve. One of the newest search engines specifically targeted at exploring the deep web, this one is available after you sign up for a free membership.
• OAIster. Search for digital items with this tool that provides 12 million resources from over 800 repositories.
• direct search. Search through all the direct search databases or select a specific one with this tool.
• CloserLook Search. Search for information on health, drugs and medicine, city guides, company profiles, and Canadian airfares with this customized search engine that specializes in the deep web.
• Northern Light Search. Find information with the quick search or browse through other search tools here.
• Yahoo! Search Subscriptions. Use this tool to combine a search on Yahoo! with searches in journals where you have subscriptions such as Wall Street Journal and New England Journal of Medicine.
• Librarians’ Internet Index (LII) is a publicly-funded website and weekly newsletter serving California, the nation, and the world.
• The Scout Archives. This database is the culmination of nine years’ worth of compiling the best of the Internet.
• Daylife. Find news with this site that offers some of the best global news stories along with photos, articles, quotes, and more.
• Silobreaker. This tool shows how news and people in the news impacts the global culture with current news stories, corresponding maps, graphs of trends, networks of related people or topics, fact sheets, and more.
• spock. Find anyone on the web who might not normally show up on the surface web through blogs, pictures, social networks, and websites here.
• The WWW Virtual Library. One of the oldest databases of information available on the web, this site allows you to search by keyword or category.
• pipl. Specifically designed for searching the deep web for people, this search engine claims to be the most powerful for finding someone.
• Complete Planet is a free and well designed directory resource makes it easy to access the mass of dynamic databases that are cloaked from a general purpose search.
• Infoplease is an information portal with a host of features. Using the site, you can tap into a good number of encyclopedias, almanacs, an atlas, and biographies. Infoplease also has a few nice offshoots like Factmonster.com for kids and Biosearch, a search engine just for biographies.
Warning: Geek joke.

There’s a brief that can’t be broken.
There’s a bug goes on and on,
Empty chars in empty tables,
Now TRUNCATE has been and gone.

Here they talked about INDEXes.
Here is was they set the KEY.
Here they normalized the data,
And tomorrow: build in C.

CREATE TABLE in the corner,
and the database was born!
And they wrote with keyboards singing!
And I can hear them now!
The clacky keyboards they prefered!
Became their very downfall,
when they missed XSS testing.
And pushed the new site live, at dawn.

Oh my friends, my friends forgive me,
That I work and you are gone.
There’s a brief that can’t broken,
There’s a bug that can’t be done.

Phantom strings outside of slashes,
Phantom commands into core,
Empty CHARs in empty tabless,
Where our data rests, no more.

Oh my friends, you didn’t ask me,
What parameterize was for!
Empty CHARs in empty tables,
Where our data rests, no more.

Hints and Strategies

Searching the deep web should be done a bit differently, so use these strategies to help you get started on your deep web searching.

• Don’t rely on old ways of searching. Become aware that approximately 99% of content on the Internet doesn’t show up on typical search engines, so think about other ways of searching.
• Search for databases. Using any search engine, enter your keyword alongside “database” to find any searchable databases (for example, “running database” or “woodworking database”).
• Get a library card. Many public libraries offer access to research databases for users with an active library card.
• Stay informed. Reading blogs or other updated guides about Internet searches on a regular basis will ensure you are staying updated with the latest information on Internet searches.
• Search government databases. There are many government databases available that have plenty of information you may be seeking.
• Bookmark your databases. Once you find helpful databases, don’t forget to bookmark them so you can always come back to them again.
• Practice. Just like with other types of research, the more you practice searching the deep web, the better you will become at it.
• Don’t give up. Researchers agree that most of the information are hidden in the deep web is some of the best quality information available.

7:30 pm – April 14th (happy birthday nari !!)

Yay! Just learned how to add columns to my database.

*doesn’t know wtf to do now*

Jetpants: a toolkit for huge MySQL topologies

Tumblr is one of the largest users of MySQL on the web. At present, our data set consists of over 60 billion relational rows, adding up to 21 terabytes of unique relational data. Managing over 200 dedicated database servers can be a bit of a handful, so naturally we engineered some creative solutions to help automate our common processes.

Today, we’re happy to announce the open source release of Jetpants, Tumblr’s in-house toolchain for managing huge MySQL database topologies. Jetpants offers a command suite for easily cloning replicas, rebalancing shards, and performing master promotions. It’s also a full Ruby library for use in developing custom billion-row migration scripts, automating database manipulations, and copying huge files quickly to multiple remote destinations.

Dynamically resizable range-based sharding allows you to scale MySQL horizontally in a robust manner, without any need for a central lookup service or massive pre-allocation of tiny shards. Jetpants supports this range-based model by providing a fast way to split shards that are approaching capacity or I/O limitations. On our hardware, we can split a 750GB, billion-row pool in half in under six hours.

Jetpants can be obtained via GitHub or RubyGems.

Interested in this type of work? We’re hiring!

Collaborative Information and Databases

One of the oldest forms of information dissemination is word-of-mouth, and the Internet is no different. With the popularity of bookmarking and other collaborative sites, obscure blogs and websites can gain plenty of attention. Follow these sites to see what others are reading.

• Del.icio.us. As readers find interesting articles or blog posts, they can tag, save, and share them so that others can enjoy the content as well.
• Digg. As people read blogs or websites, they can “digg” the ones they like, thus creating a network of user-selected sites on the Internet.
• Technorati. Not only is this site a blog search engine, but it is also a place for members to vote and share, thus increasing the visibility for blogs.
• StumbleUpon. As you read information on the Internet, you can Stumble it and give it a thumbs up or down. The more you Stumble, the more closely aligned to your taste will the content become.
• Reddit. Working similarly to StumbleUpon, Reddit asks you to vote on articles, then customizes content based on your preferences.
• Twine. With Twine you can search for information as well as share with others and get recommendations from Twine.
• Kreeo.com. This collaborative site offers shared knowledge from its members through forums, blogs, and shared websites.
No Access For You!

I am not a huge expert, but within my company, I am considered the MS Access guru.  Our IT department doesn’t support software (they’re just network folks), so if someone asks them an Access question, they always refer them to me.

I often get phone calls that ask “Is um…Sally…there?”.  I’ll answer yes, that’s me.  They’ll often hang up right then, wait a few minutes, and then call me back.  When they get me the second time, they’ll explain “Well, when I called, a LADY answered, so I had to call IT to make sure they’d given me the right number!  Ha ha ha.” Despite the obviously female name, they were still expecting a man, maybe a foreign one to explain the “odd” name.

The hanger-uppers then go on to say, “Well, IT said you could maybe help me.  You see, I’m working in Access, that’s a DATABASE program - I’m not sure if you know about databases”. I’ll answer that yes, I do, and in fact, that is why IT refers people to me.  The worst ones STILL won’t get it and will start by explaining that there are these FORMS, and god forbid, CODE… I just say “I’m sorry, but I’m going to put you on hold while you explain Access 101 to yourself.  I’ll check back with you in 10 minutes to see if you’ve gotten around to an actual question.”

I’ve had a few actually wait the 10 minutes and apologize.

A lot of people talk about how Abstergo lies but reading some of Shaun’s database entries, especially in Syndicate, his own bias shows pretty thoroughly. He starts his description of the Phoenix Project with an accusation and continues to promote common Assassin misconceptions (that the canon has declared false). I don’t see the Assassins misinforming the Initates as any different than Abstergo misinforming the public in their own databases. They’re all liars.

image from madeinmasyaf

Many thanks to the inestimable R. Seipp, programming partner and genius, for this gif.

Big Data Is Too Big for Scientists to Handle Alone

Much of the recent data frenzy — from the physical and life sciences to the user-generated content aggregated by Google, Facebook and Twitter — has come in the form of largely unstructured streams of digital potpourri that require new, flexible databases, massive computing power and sophisticated algorithms to wring out bits of meaning from them, said Matt LeMay, a former product manager at the URL shortening and bookmarking service Bitly.”

Tumblr Engineering @ Percona Live MySQL Conference

We’re pleased to announce that Tumblr’s Database Engineering team will be attending the Percona Live MySQL Conference next week in Santa Clara, CA!

We’ll be giving a talk on our open source automation software, Jetpants, which has helped us scale to over 175 billion distinct rows of relational data to date. We’re also looking forward to attending a number of amazing sessions from our friends at Percona, Facebook, Oracle, Palomino, Etsy, and more.

If you haven’t registered yet, use code SpeakMySQL to save 15%. Hope to see you there!

youtube

There is more than one type of database.  Here are seven types of database – in a song.

Seven Types of Database In a Song, two minutes.

I’m just saying, client of mine, that if I ran a database which assigned 5-digit codes to food products, I’d personally ensure that 24601 directed to Bread, not Sausages.

What a waste.

youtube

Seven Databases in Song (by jimbojron)