File shares will not exist forever. Microsoft will eventually stop supporting them. It is important that IT departments begin reviewing the objects from the “bottom up” (file to folder to sub directory to directory) yesterday. Information management professionals know best how to apply a set of internal (user-driven) and external (records retention schedule) criteria to determine what objects stay where they are, what objects move to other locations or custodians, and what objects are deleted.
CIO be wary: this is not a four month project.
Certain tools will advance the tasks and schedule quickly, though. For example, have an abandoned records policy in place -- if file custodians are gone, IT is empowered to make decisions on objects. Create a records retention schedule in partnership with the Legal function.
A file share cleanup project is linear from statistical analysis through the deduplication phases. Post deduplication, the world becomes a bit more complicated -- satisfyingly so.
Statistical Analysis
Statistical analysis is my favorite part of the project. Look to populate the following metadata per object:
Must Have | Nice to Have | |
|
| |
Be prepared to write an index that maps the keywords in a records retention schedule to file extensions and the types of files. A third party software will typically divide extensions into the following categories:
- Miscellaneous Files (this is usually about 80 percent of the initial output because the tool isn’t industry-specific -- but you as administrator can sort them in the tool so that the next time you run the report, the extensions will more closely match their right category)
- Container Files
- Data Files
- Text Files
- Temporary and Backup Files
- Graphic Files
- System Files
- PC Virtualization Files
- Database Files
- Office Files and Documents
- Program Files
- Internet Files
- Software Development Files
- Video Files
- Configuration Files
- Mail Files
- Audio Files
- Help Files
Miscellaneous Files | 79.00% | Unknown file types |
Container Files | 10.00% | Compressed Archives and disk images |
Data Files | 4.60% | Files containing data of various kinds, not including files of databases |
Text Files | 4.40% | Plain text files, log files |
Temporary and Backup Files | 1.30% | Temporary files and backup copies containing previous version of current files |
Graphic Files | 0.20% | Files containing pictures, images or mouse cursors |
System Files | 0.20% | System Files |
PC Virtualization Files | 0.20% | Files of Virtual PC, VMWare, etc. |
Database Files | 0.10% | Files containing the data of client and server databases |
Office Files and Documents | 0.00% | Documents and files of office programs and PDFs |
Program Files | 0.00% | Program Files, Libraries and other compiled resources |
Internet Files | 0.00% | Files related to the WWW, like HTML files |
Software Development Files | 0.00% | Source and project files of software development projects |
Video Files | 0.00% | Files containing videos or animations |
Configuration Files | 0.00% | Files containing configuration settings |
Mail Files | 0.00% | Email messages and files of email clients |
Audio Files | 0.00% | Files containing music, sounds or playlists |
Help Files | 0.00% | Files of the Windows help system |
A sorted list of extensions into the above categories can be written in an excel file -- a very helpful tool throughout the entire project.
Use the baseline results from the file share surveys to calculate when the growth of information exploded and whether or not shorter, more aggressive retention periods may reduce storage costs:
Remember to make the graph easily decipherable -- simple is elegant.
Survey the metadata of file shares at least once a month for three months to glean a general understanding of user behaviors before you launch conversations with content creators to de-duplicate. The “bottom up” approach may take a more scenic route in terms of project schedule, but the payoff with end users in the long run is tremendous.