What is a short sequence of characters that appear at the end of a filename preceded by a period?

/en/basic-computer-skills/undo-your-mistakes/content/

Understanding file extensions

Your computer has many different types of files on it, and each one has its own file extension. A file extension is a three- or four-letter identifier found at the end of a file name and following a period. These extensions tell you about the characteristics of a file and its use. In this lesson, we'll go over some examples of these extensions, as well as how to determine a particular file's extension.

Examples of file extensions

  • A JPEG uses the .jpg or .jpeg extension (for example, image.jpg).
  • A Word document uses the .docx extension, or .doc for older versions (for example, CoverLetter.docx).
  • An MP3 audio file uses the .mp3 extension (for example, rhyme_rap.mp3).
  • An Excel spreadsheet uses the .xlsx extension, or .xls for older versions (for example, budget.xls).

Hidden file extensions

Some operating systems hide file extensions by default to reduce clutter. It is possible to show the file extensions if they're hidden. Click the links below to see how to show file extensions in Windows and macOS:

You can also usually tell what the file type is by looking at the file's icon. For example, the Word document looks like a file with a W in the corner, while an Excel spreadsheet looks like a file with an X in the corner.

What is a short sequence of characters that appear at the end of a filename preceded by a period?

File extensions also tell your computer which applications to use when opening that file. Sometime you may want to use a different application to open that file.

/en/basic-computer-skills/downloading-and-uploading/content/

Filename suffix that indicates the file's type

A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with a full stop (period), but in some systems[1] it is separated with spaces. Other extension formats include dashes and/or underscores on early versions of Linux and some versions of IBM AIX.[citation needed]

Some file systems implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others treat filename extensions as part of the filename without special distinction.

Usage

Filename extensions may be considered a type of metadata.[2] They are commonly used to imply information about the way data might be stored in the file. The exact definition, giving the criteria for deciding what part of the file name is its extension, belongs to the rules of the specific filesystem used; usually the extension is the substring which follows the last occurrence, if any, of the dot character (example: txt is the extension of the filename readme.txt, and html the extension of mysite.index.html). On file systems of some mainframe systems such as CMS in VM, VMS, and of PC systems such as CP/M and derivative systems such as MS-DOS, the extension is a separate namespace from the filename. Under Microsoft's DOS and Windows, extensions such as EXE, COM or BAT indicate that a file is a program executable. In OS/360 and successors, the part of the dataset name following the last period is treated as an extension by some software, e.g., TSO EDIT, but it has no special significance to the operating system itself; the same applies to Unix files in MVS.

Filesystems for UNIX-like operating systems do not separate the extension metadata from the rest of the file name. The dot character is just another character in the main filename. A file name may have no extensions. Sometimes it is said to have more than one extension, although terminology varies in this regard, and most authors define extension in a way that doesn't allow more than one in the same file name. More than one extension usually represents nested transformations, such as files.tar.gz (the .tar indicates that the file is a tar archive of one or more files, and the .gz indicates that the tar archive file is compressed with gzip). Programs transforming or creating files may add the appropriate extension to names inferred from input file names (unless explicitly given an output file name), but programs reading files usually ignore the information; it is mostly intended for the human user. It is more common, especially in binary files, for the file itself to contain internal metadata describing its contents. This model generally requires the full filename to be provided in commands, whereas the metadata approach often allows the extension to be omitted.

The VFAT, NTFS, and ReFS file systems for Windows also do not separate the extension metadata from the rest of the file name, and allow multiple extensions.

With the advent of graphical user interfaces, the issue of file management and interface behavior arose. Microsoft Windows allowed multiple applications to be associated with a given extension, and different actions were available for selecting the required application, such as a context menu offering a choice between viewing, editing or printing the file. The assumption was still that any extension represented a single file type; there was an unambiguous mapping between extension and icon.

The classic Mac OS disposed of filename-based extension metadata entirely; it used, instead, a distinct file type code to identify the file format. Additionally, a creator code was specified to determine which application would be launched when the file's icon was double-clicked. macOS, however, uses filename suffixes, as well as type and creator codes, as a consequence of being derived from the UNIX-like NeXTSTEP operating system.

Improvements

The filename extension was originally used to determine the file's generic type.[citation needed] The need to condense a file's type into three characters frequently led to abbreviated extensions. Examples include using .GFX for graphics files, .TXT for plain text, and .MUS for music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS or .WSn, where n was the program's version number. Also, conflicting uses of some filename extensions developed. One example is .rpm, used for both RPM Package Manager packages and RealPlayer Media files;.[3] Others are .qif, shared by DESQview fonts, Quicken financial ledgers, and QuickTime pictures;[4] .gba, shared by GrabIt scripts and Game Boy Advance ROM images;[5] .sb, used for SmallBasic and Scratch; and .dts, being used for Dynamix Three Space and DTS.

Some other operating systems that used filename extensions generally had fewer restrictions on filenames. Many allowed full filename lengths of 14 or more characters, and maximum name lengths up to 255 were not uncommon. The file systems in operating systems such as Multics and UNIX stored the file name as a single string, not split into base name and extension components, allowing the "." to be just another character allowed in file names. Such systems generally allow for variable-length filenames, permitting more than one dot, and hence multiple suffixes. Some components of Multics and UNIX, and applications running on them, used suffixes, in some cases, to indicate file types, but they did not use them as much—for example, executables and ordinary text files had no suffixes in their names.

The High Performance File System (HPFS), used in Microsoft and IBM's OS/2 also supported long file names and did not divide the file name into a name and an extension. The convention of using suffixes continued, even though HPFS supported extended attributes for files, allowing a file's type to be stored in the file as an extended attribute.

Microsoft's Windows NT's native file system, NTFS, supported long file names and did not divide the file name into a name and an extension, but again, the convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows.

When the Internet age first arrived, those using Windows systems that were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM, while those using Macintosh or UNIX computers could use the recommended .html filename extension. This also became a problem for programmers experimenting with the Java programming language, since it requires the four-letter suffix .java for source code files and the five-letter suffix .class for Java compiler object code output files.[6]

Eventually, Windows 95 introduced support for long file names, and removed the 8.3 name/extension split in file names from non-NT Windows, in an extended version of the commonly used FAT file system called VFAT. VFAT first appeared in Windows NT 3.5 and Windows 95. The internal implementation of long file names in VFAT is largely considered to be a kludge[by whom?], but it removed the important length restriction and allowed files to have a mix of upper case and lower case letters, on machines that would not run Windows NT well.

Command name issues

The use of a filename extension in a command name appears occasionally, usually as a side effect of the command having been implemented as a script, e.g., for the Bourne shell or for Python, and the interpreter name being suffixed to the command name, a practice common on systems that rely on associations between filename extension and interpreter, but sharply deprecated[7] in Unix-like systems, such as Linux, Oracle Solaris, BSD-based systems, and Apple's macOS, where the interpreter is normally specified as a header in the script ("shebang").

On association-based systems, the filename extension is generally mapped to a single, system-wide selection of interpreter for that extension (such as ".py" meaning to use Python), and the command itself is runnable from the command line even if the extension is omitted (assuming appropriate setup is done). If the implementation language is changed, the command name extension is changed as well, and the OS provides a consistent API by allowing the same extensionless version of the command to be used in both cases. This method suffers somewhat from the essentially global nature of the association mapping, as well as from developers' incomplete avoidance of extensions when calling programs, and that developers can't force that avoidance. Windows is the only remaining widespread employer of this mechanism.

On systems with interpreter directives, including virtually all versions of Unix, command name extensions have no special significance, and are by standard practice not used, since the primary method to set interpreters for scripts is to start them with a single line specifying the interpreter to use (which could be viewed as a degenerate resource fork). In these environments, including the extension in a command name unnecessarily exposes an implementation detail which puts all references to the commands from other programs at future risk if the implementation changes. For example, it would be perfectly normal for a shell script to be reimplemented in Python or Ruby, and later in C or C++, all of which would change the name of the command were extensions used. Without extensions, a program always has the same extension-less name, with only the interpreter directive and/or magic number changing, and references to the program from other programs remain valid.

Security issues

The default behavior of File Explorer, the file browser provided with Microsoft Windows, is for filename extensions to not be displayed. Malicious users have tried to spread computer viruses and computer worms by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs. The hope is that this will appear as LOVE-LETTER-FOR-YOU.TXT, a harmless text file, without alerting the user to the fact that it is a harmful computer program, in this case, written in VBScript. Default behavior for ReactOS is to display filename extensions in ReactOS Explorer.

Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) included customizable lists of filename extensions that should be considered "dangerous" in certain "zones" of operation, such as when downloaded from the web or received as an e-mail attachment. Modern antivirus software systems also help to defend users against such attempted attacks where possible.

Some viruses take advantage of the similarity between the ".com" top-level domain and the ".COM" filename extension by emailing malicious, executable command-file attachments under names superficially similar to URLs (e.g., "myparty.yahoo.com"), with the effect that unaware users click on email-embedded links that they think lead to websites but actually download and execute the malicious attachments.

There have been instances of malware crafted to exploit vulnerabilities in some Windows applications which could cause a stack-based buffer overflow when opening a file with an overly long, unhandled filename extension.

The filename extension is just a marker and the content of the file does not have to match it.[8] This can be used to disguise malicious content. When trying to identify a file for security reasons, it is therefore considered dangerous to rely on the extension alone and a proper analysis of the content of the file is preferred. For example, on UNIX derived systems, it is not uncommon to find files with no extensions at all, as commands such as file (command) are meant to be used instead, and will read the file's header to determine its content.

Alternatives

In many Internet protocols, such as HTTP and MIME email, the type of a bitstream is stated as the media type, or MIME type, of the stream, rather than a filename extension. This is given in a line of text preceding the stream, such as Content-type: text/plain.

There is no standard mapping between filename extensions and media types, resulting in possible mismatches in interpretation between authors, web servers, and client software when transferring files over the Internet. For instance, a content author may specify the extension svgz for a compressed Scalable Vector Graphics file, but a web server that does not recognize this extension may not send the proper content type application/svg+xml and its required compression header, leaving web browsers unable to correctly interpret and display the image.

BeOS, whose BFS file system supports extended attributes, would tag a file with its media type as an extended attribute. The KDE and GNOME desktop environments associate a media type with a file by examining both the filename suffix and the contents of the file, in the fashion of the file command, as a heuristic. They choose the application to launch when a file is opened based on that media type, reducing the dependency on filename extensions. macOS uses both filename extensions and media types, as well as file type codes, to select a Uniform Type Identifier by which to identify the file type internally.

See also

  • file (command)
  • List of file formats
  • List of filename extensions
  • Metadata
  • .properties

References

  1. ^ "What Is a File?" (PDF). z/VM - Version 7 Release 1 - CMS Primer (PDF). IBM. 2018-09-11. p. 7. SC24-6265-00. One thing you need to know about creating files with z/VM is that each file needs its own three-part identifier. The first part of the identifier is the file name. The second part is the file type. And the third part is the file mode. These three file identifiers are often abbreviated fn ft fm.
  2. ^ Stauffer, Todd; McElhearn, Kirk (2006). Mastering Mac OS X. John Wiley & Sons. pp. 95–96. ISBN 9780782151282. Retrieved 2 October 2017.
  3. ^ File Extension .RPM Details from filext.com
  4. ^ File Extension .QIF Details from filext.com
  5. ^ File Extension .GBA Details from filext.com
  6. ^ "javac – Java programming language compiler". Sun Microsystems, Inc. 2004. Retrieved 2009-05-31. Source code file names must have .java suffixes, class file names must have .class suffixes, and both source and class files must have root names that identify the class.
  7. ^ Commandname Extensions Considered Harmful
  8. ^ "What Is a File Extension?".

  • What is a short sequence of characters that appear at the end of a filename preceded by a period?
    Media related to Filename extensions at Wikimedia Commons
  • Data Formats Filename extension at Curlie

Retrieved from "https://en.wikipedia.org/w/index.php?title=Filename_extension&oldid=1114729295"


Page 2

Filename convention used by old versions of DOS and Windows

An 8.3 filename[1] (also called a short filename or SFN) is a filename convention used by old versions of DOS and versions of Microsoft Windows prior to Windows 95 and Windows NT 3.5. It is also used in modern Microsoft operating systems as an alternate filename to the long filename for compatibility with legacy programs. The filename convention is limited by the FAT file system. Similar 8.3 file naming schemes have also existed on earlier CP/M, TRS-80, Atari, and some Data General and Digital Equipment Corporation minicomputer operating systems.

Overview

8.3 filenames are limited to at most eight characters (after any directory specifier), followed optionally by a filename extension consisting of a period . and at most three further characters. For systems that only support 8.3 filenames, excess characters are ignored. If a file name has no extension, a trailing . has no significance (that is, myfile and myfile. are equivalent). Furthermore, file and directory names are uppercase in this system, even though systems that use the 8.3 standard are usually case-insensitive (making CamelCap.tpu equivalent to the name CAMELCAP.TPU). However, on non-8.3 operating systems (such as almost any modern operating system) accessing 8.3 file systems (including DOS-formatted diskettes, but also including some modern memory cards and networked file systems), the underlying system may alter filenames internally to preserve case and avoid truncating letters in the names, for example in the case of VFAT.

VFAT and computer-generated 8.3 filenames

VFAT, a variant of FAT with an extended directory format, was introduced in Windows 95 and Windows NT 3.5. It allowed mixed-case Unicode long filenames (LFNs) in addition to classic 8.3 names by using multiple 32-byte directory entry records for long filenames (in such a way that only one will be recognised by old 8.3 system software as a valid directory entry).

To maintain backward-compatibility with legacy applications (on DOS and Windows 3.1), on FAT and VFAT filesystems an 8.3 filename is automatically generated for every LFN, through which the file can still be renamed, deleted or opened, although the generated name (e.g. OVI3KV~N) may show little similarity to the original. On NTFS filesystems the generation of 8.3 filenames can be turned off.[2] The 8.3 filename can be obtained using the Kernel32.dll function GetShortPathName.[3][4]

Although there is no compulsory algorithm for creating the 8.3 name from an LFN, Windows uses the following convention:[5]

  1. If the LFN is 8.3 uppercase, no LFN will be stored on disk at all.
    • Example: TEXTFILE.TXT
  2. If the LFN is 8.3 mixed case, the LFN will store the mixed-case name, while the 8.3 name will be an uppercased version of it.
    • Example: TextFile.Txt becomes TEXTFILE.TXT.
  3. If the filename contains characters not allowed in an 8.3 name (including space which was disallowed by convention though not by the APIs) or either part is too long, the name is stripped of invalid characters such as spaces and extra periods. If the name begins with periods . the leading periods are removed. Other characters such as + are changed to the underscore _, and letters are put in uppercase. The stripped name is then truncated to the first 6 letters of its basename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.
    • Example: TextFile.Mine.txt becomes TEXTFI~1.TXT (or TEXTFI~2.TXT, should TEXTFI~1.TXT already exist). ver +1.2.text becomes VER_12~1.TEX. .bashrc.swp becomes BASHRC~1.SWP
  4. On all NT versions including Windows 2000 and later, if at least 4 files or folders already exist with the same extension and first 6 characters in their short names, the stripped LFN is instead truncated to the first 2 letters of the basename (or 1 if the basename has only 1 letter), followed by 4 hexadecimal digits derived from an undocumented hash of the filename, followed by a tilde, followed by a single digit, followed by a period ., followed by the first 3 characters of the extension.[6]
    • Example: TextFile.Mine.txt becomes TE021F~1.TXT.
  5. On Windows 95, 98 and ME, if more than 9 files or folders with the same extension and first 6 characters and in their short names (so that ~1 through ~9 suffixes aren't enough to resolve the collision), the name is further truncated to 5 letters, followed by a tilde, followed by two digits starting from 10, followed by a period . and the first 3 characters of the extension.
    • Example: TextFile.Mine.txt becomes TEXTF~10.TXT if TEXTFI~1.TXT through TEXTFI~9.TXT all exist already.

NTFS, a file system used by the Windows NT family, supports LFNs natively, but 8.3 names are still available for legacy applications. This can optionally be disabled to improve performance in situations where large numbers of similarly named files exist in the same folder.[2]

The ISO 9660 file system (mainly used on compact discs) has similar limitations at the most basic Level 1, with the additional restriction that directory names cannot contain extensions and that some characters (notably hyphens) are not allowed in filenames. Level 2 allows filenames of up to 31 characters, more compatible with classic AmigaOS and classic Mac OS filenames.

Compatibility

This legacy technology is used in a wide range of products and devices, as a standard for interchanging information, such as compact flash cards used in cameras. VFAT LFN long filenames introduced by Windows 95/98/ME retained compatibility. But the VFAT LFN used on NT-based systems (Windows NT/2K/XP) uses a modified 8.3 shortname.

If a filename contains only lowercase letters, or is a combination of a lowercase basename with an uppercase extension, or vice versa; and has no special characters, and fits within the 8.3 limits, a VFAT entry is not created on Windows NT and later versions such as XP. Instead, two bits in byte 0x0c of the directory entry are used to indicate that the filename should be considered as entirely or partially lowercase. Specifically, bit 4 means lowercase extension and bit 3 lowercase basename, which allows for combinations such as example.TXT or HELLO.txt but not Mixed.txt. Few other operating systems support this. This creates a backward-compatibility filename mangling problem with older Windows versions (95, 98, ME) that see all-uppercase filenames if this extension has been used, and therefore can change the capitalization of a file when it is transported, such as on a USB flash drive. This can cause problems for operating systems that do not exhibit the case-insensitive filename behavior as DOS and Windows do. Linux will recognize this extension when reading;[7] the mount option shortname determines whether this feature is used when writing.[8] For MS-DOS you may use Henrik Haftmann's DOSLFN.[9]

Directory table

A directory table is a special type of file that represents a directory. Each file or directory stored within it is represented by a 32-byte entry in the table. Each entry records the name, extension, attributes (archive, directory, hidden, read-only, system and volume), the date and time of creation, the address of the first cluster of the file/directory's data and finally the size of the file/directory.

Legal characters for DOS filenames include the following:

  • Upper case letters A–Z
  • Numbers 0–9
  • Space (though trailing spaces in either the base name or the extension are considered to be padding and not a part of the filename, also filenames with spaces in them must be enclosed in quotes to be used on a DOS command line, and if the DOS command is built programmatically, the filename must be enclosed in double double-quotes (""..."") when viewed as a variable within the program building the DOS command.)
  • !, #, $, %, &, ', (, ), -, @, ^, _, `, {, }, ~
  • Values 128–255 (though if NLS services are active in DOS, some characters interpreted as lowercase are invalid and unavailable)

This excludes the following ASCII characters:

  • ", *, +, ,, /, :, ;, <, =, >, ?, \, [, ], |[10]
    Windows/MS-DOS has no shell escape character
  • . (U+002E . FULL STOP) within name and extension fields, except in . and .. entries (see below)
  • Lower case letters a–z, stored as A–Z on FAT12/FAT16
  • Control characters 0–31
  • Value 127 (DEL)[dubious ]

The DOS filenames are in the OEM character set. Code 0xE5 as the first byte (see below) makes troubles when extra-ASCII characters are used.

Directory entries, both in the Root Directory Region and in subdirectories, are of the following format:

Byte Offset Length Description
0x00 8 DOS filename (padded with spaces)

The first byte can have the following special values:

0x00 Entry is available and no subsequent entry is in use
0x05 Initial character is actually 0xE5
0x2E Dot entry: either . or ..
0xE5 Entry has been previously erased. File undelete utilities must replace this character with a regular character as part of the undeletion process.
0x08 3 DOS file extension (padded with spaces, may be empty)
0x0b 1 File Attributes

The first byte can have the following special values:

Bit Mask Description
0 0x01 Read Only
1 0x02 Hidden
2 0x04 System
3 0x08 Volume Label
4 0x10 Subdirectory
5 0x20 Archive
6 0x40 Device (internal use only, never found on disk)
7 0x80 Unused

An attribute value of 0x0F is used to designate a long filename entry.

0x0c 1 Reserved; two bits are used by NT and later versions to encode case information
0x0d 1 Create time, fine resolution: 10 ms units, values from 0 to 199.
0x0e 2 Create time. The hour, minute and second are encoded according to the following bitmap:
Bits Description
15–11 Hours (0–23)
10–5 Minutes (0–59)
4–0 Seconds/2 (0–29)

Note that the seconds is recorded only to a 2 second resolution. Finer resolution for file creation is found at offset 0x0d.

0x10 2 Create date. The year, month and day are encoded according to the following bitmap:
Bits Description
15–9 Year (0 = 1980, 127 = 2107)
8–5 Month (1 = January, 12 = December)
4–0 Day (1–31)
0x12 2 Last access date; see offset 0x10 for description.
0x14 2 EA-Index (used by OS/2 and NT) in FAT12 and FAT16, High 2 bytes of first cluster number in FAT32
0x16 2 Last modified time; see offset 0x0e for description.
0x18 2 Last modified date; see offset 0x10 for description.
0x1a 2 First cluster in FAT12 and FAT16. Low 2 bytes of first cluster in FAT32.
0x1c 4 File size

Working with short filenames in a command prompt

Sometimes it may be desirable to convert a long filename to a short filename, for example when working with the command prompt. A few simple rules can be followed to attain the correct 8.3 filename.

  1. A SFN filename can have at most 8 characters before the dot. If it has more than that, the first 6 must be written, then a tilde ~ as the seventh character and a number (usually 1) as the eighth. The number distinguishes it from other files with both the same first six letters and the same extension.
  2. Dots are important and must be used even for folder names (if there is a dot in the folder name). If there are multiple dots in the long file/directory name, only the last one is used. The preceding dots should be ignored. If there are more characters than three after the final dot, only the first three are used.
  3. Generally:
    • Any spaces in the filenames should be ignored when converting to SFN.
    • Ignore all periods except the last one. Do not include any other periods, just like the spaces. Use the last period if any, and the next characters (up to 3). For instance, for .manifest, .man only would be used.
    • Commas, square brackets, semicolons, = signs and + signs are changed to underscores.
    • Case is not important, upper case and lower case characters are treated equally.

To find out for sure the SFN or 8.3 names of the files in a directory

use: dir /x shows the short names if there is one, and the long names.

or: dir /-n shows only the short names, in the original DIR listing format.

In Windows NT-based operating systems, command prompt (cmd.exe) applets accept long filenames with wildcard characters (question mark ? and asterisk *); long filenames with spaces in them need to be escaped (i.e. enclosed in single or double quotes).[11]

Starting with Windows Vista, console commands and PowerShell applets perform limited pattern matching by allowing wildcards in filename and each subdirectory in the file path and silently substituting the first matching directory entry (for example, C:\>CD \prog*\inter* will change the current directory to C:\Program Files\Internet Explorer\).

See also

  • File Allocation Table (FAT)
  • Design of the FAT file system
  • File system
  • Filename extension
  • Long filename

References

  1. ^ "Naming a File". Microsoft Developer Network. Archived from the original on 2008-10-15. Retrieved 2007-03-22.
  2. ^ a b "How to Disable the 8.3 Name Creation on NTFS Partitions". Microsoft. Retrieved 2021-02-26.
  3. ^ "GetShortPathName Function". MSDN. Archived from the original on 2015-10-01. Retrieved 2014-09-15.
  4. ^ "How to Get a Short Filename from a Long Filename". Microsoft. Retrieved 2021-02-26.
  5. ^ "How Windows Generates 8.3 File Names from Long File Names". Microsoft.
  6. ^ Galvin, Thomas (9 June 2015). "A Tale of Two File Names". tomgalvin.uk. Retrieved 17 October 2022.
  7. ^ "dir.c\fat\fs - kernel/git/torvalds/linux.git - Linux kernel source tree". git.kernel.org. Retrieved 2018-06-25.
  8. ^ "mount(8): mount filesystem – Linux man page".
  9. ^ "DOSLFN".
  10. ^ Andries Brouwer (2007-12-26). "Directory Entry". The FAT filesystem. Retrieved 2013-07-30.
  11. ^ "Using Long File Names".

Retrieved from "https://en.wikipedia.org/w/index.php?title=8.3_filename&oldid=1116650720"