Tab Separated Values (TSV): a format for tabular data exchange

摘要:
TabSeparatedValues(TSV):窗体数据交换TabSeveratedValue(TSV

Tab Separated Values (TSV): a format for tabular data exchange

Tab Separated Values (TSV): a format for tabular data exchange

TSV is a very simple textual data format which allows tabular data
to be exhanged between applications that use different internal data
formats. This document briefly explains this standardized format
and gives simple examples of using it.

Content:

The TSV format

The Tab Separated Values has been
officially registered
as an Internet media type
(MIME type) under the name
text/tab-separated-values. Note that
major media type text means that the data is in textual
format which can be viewed and edited using a normal text editor; in practice
however this is seldom done- instead, the format is programmatically written
and read.

The abbreviation TSV (or tsv) is commonly used, and usually the file name
suffix for TSV files is ".tsv".

The registry entry is very short, since the format is very simple, but the
description is paraphrased here in plain English (as opposite to the partly formalized
notation in the registration):

  • A file in TSV format consists of lines.
  • Each line contain fields separated
    from each other by TAB characters
    (horizontal tab, HT, Ascii control code9).
  • "Field" means here just any string of characters, excluding TABs.
    The point is simply that
    TABs divide a line into pieces, components.
  • Each line must contain the same number of fields.
  • The first line contains the names for the fields (on all lines), i.e.
    column headers.

The last rule could be taken just as allowing the
labeling of columns that way. Mostly if you wish to include just tabular data
as such, it probably works OK. But it's usually recommendable to use some labeling
there.

Since TAB is used as a separator between fields, a field cannot contain a TAB.
However, TABs usually don't appear in data items that you wish to tabulate,
so this is seldom a restriction. There are various other formats which are very
similar to TSV but use a different separator, such as
Comma Separated Values
(CSV).
Note that commas, spaces, and other characters often used
as separators in such formats
appear rather often in data to be tabulated, at least in
header fields. Special conventions can be made to allow the inclusion of
the separator into data fields, but the extreme simplicity of the data
format is then lost.

The example given in the registration is the following,
where <TAB> denotes a TAB character:

Name<TAB>Age<TAB>Address
Paul<TAB>23<TAB>1115 W Franklin
Bessy the Cow<TAB>5<TAB>Big Farm Way
Zeke<TAB>45<TAB>W Main St

Not to be shown as is!

Although TSV is a text format,
a TSV file is not expected to appear in a nice tabular format
when displayed or printed as such, e.g. by an editor. In special cases it might do that,
if tab stops are set suitably in the environment and if the fields in a column
have roughly the same width. But even the simple example probably fails
when embedded into an HTML document (inside a
PRE element, which is the best shot):

Name	Age	Address
Paul	23	1115 W Franklin
Bessy the Cow	5	Big Farm Way
Zeke	45	W Main St

Using TABs in HTML is unreliable, but more fundamentally, tabbing with TAB is
in itself not structural. It just takes to some predefined tab stop; typically the
tab stops are set at 1st, 9th, 17th, etc. character position. This means that
the data would look OK by accident only. And when using TSV, you are not
supposed to try to "fix" things by using multiple TABs; that
would mean that there are empty fields between
the TABs, defeating the very idea of TSV.

What happens if you ask your Web browser to show a TSV file, e.g. by following a
link to a file containing our example? It really depends on
many things; it's a matter of
handling media types, which can be a bit confusing.
It most essentially depends on the settings of your browser, but
also on how the server sends it. But in any case, you as a user have
at least as one option the possibility of saving the file
onto local disk, to be processed separately using some suitable program (e.g.
MSExcel).
So TSV is one way of making tabular data available on a network in a universally
accessible way, though it may require some skills on the user side to know what
could be done with the data and do something useful with it.

Importing TSV data to MS Excel

We could give a TSV file as input to a
spreadsheet application, like MSExcel.
Such programs generally accept TSV format data.
The data can then be viewed, printed, and manipulated in tabular format.
And you can save the data in MSExcel's "native" format (which cannot
be processed by other programs unless they have been specifically written
to process it) or in TSV format, or in a few other formats.
Our simple example data would get displayed roughly as follows:

The data appears in columns labeled A, B, C, and rows labeled 1, 2, 3, 4 in framed cells. The cells that contain numbers only are right-aligned, other cells left-aligned. Cell (A,3) appears as truncated to "Bessy the".

Cells with long content may look truncated. This is
Excel's feature and often useful when manipulating spreadsheets.
The presentation, both on screen and on paper, can be tuned as needed,
using various tools in Excel.

When you have Excel in use, you can open the the File menu,
then select Open, and pick up the file to be opened. For finding the file,
you may need to set value of the "Files of type:" setting (which is
effectively a file type filter) to "Allfiles(*.*)". (The details
may vary depending on the version and configuration of the program.)

But you can make things easier by setting things up so that your system
(we're assuming some flavor of Windows here) automatically recognizes
.tsv files as something to be opened in Excel, when you click on their
icons. Assuming there is no such association for .tsv in your system yet,
you'll be prompted for one when you first click on the icon of a .tsv file.
When prompted for "Description of '.tsv' files:", you could reply e.g.
"text/tab-separated-values". Then pick up Excel from the list of programs
and clickOK.

When TSV data is imported into Excel, some data
transformation and interpretation may take place. For example,
some strings might be interpreted as dates and converted to a different
date format, and Excel generally treats a string of digits as a number
and right-justifies it. In such cases, "opening by clicking"
could be inadequate, since it might not give you the option of
specifying how the data is to be interpreted. Explicitly opening
the file in Excel normally lets you use "Text import wizard" where you
can specify the interpretation of each column.

Generating TSV data

For example, if you have a program of your own (say, a Fortran or C program,
or a Perl script) that writes out some tabular data, and you'd like to be
able to open (or let others open) the data in Excel, the TSV approach is rather
simple. You just need to know how to write data in TSV format, which is alot
easier than trying to generate MSExcel format. Usually you
can just use TAB characters in a string used to format the output, but see next
paragraph as regards to editors; in C and Perl for example, you can alternatively use
the notation \t inside a character constant or string constant.

And, of course, you could type TSV data "by hand", using a text editor.
Note that typing the TAB character may require some extra trick, since
many editors process it as command-like, instead of entering it into data
as a character. For example, in some modes Emacs turns TABs to sequences of
spaces, which won't do in TSV; to prevent this, type control-Q before hitting
the TABkey.

As an example, here is
Fortran 77 code that writes a matrix is TSV format:

      parameter(m=42)
      parameter(n=5)
      real a(m,n)
... code that writes data to matrix a ...
      do 200 i=1,m
 200     print 210, (a(i,j),j=1,n)
 210     format(F10.6,4('	',F10.6))

The code is not particularly elegant, since the number of
columns is hard-wired into the format statement, which
causes the first number to printed as such and 4 more with a tab character
before each of them. Between the apostrophes (single quotation marks), there
is exactly one TAB character and nothing more; the appearance on your
browser may vary.

Exporting data in TSV format from Excel

When you save a spreadsheet in Excel (using Save in the File menu,
or using control-S), it normally uses by default its own internal data format.
You can override this by using SaveAs... (in the File menu)
instead and selecting the desired format. Among the available formats,
you can pick up "Text (Tab delimited) (*.txt)", which means TSV.
(Here, too, you may encounter some variation between versions of Excel.)
Note that the default file name suffix will be .txt when; you may
wish to override this, reserving .txt for plain text files.

Saving as TSV generally loses formatting information.
Quite often this is desirable, since if you e.g. need to E-mail some data
or put a table onto the Web, information about
the specific font faces, cell widths, and other
presentational details are not needed and can cause problems.
Note that
Excel's own "Save As HTML" includes a horrendous amount of
such data
. There is software called
Office 2000 HTML Filter2.0
available from Microsoft for free, but it fixes only part of the
problem (for Excel files - for Word files, it's pretty good).

You could use this possibility for the job of making data
available on the Internet, or in an intranet. If you have the data in Excel
format, it will be less universally accessible than e.g. TSV format. Although there are
Excel viewers available free of charge for Windows, not everyone
wants to install such software just to casually view some tabular data,
and not everyone uses Windows at all. And even if people have Excel installed,
they might be unwilling to open documents from unknown origin, in fear of
macro viruses for example. So TSV might be a good idea at least as an alternative.
You could always make the data available both in Excel format and as TSV, especially
since saving as TSV may remove some formatting information.

Converting from TSV format to HTML table format

For putting data onto the Web in particular, you could additionally
convert the data from TSV format into an
HTML table and
make it a separate HTML document, or insert it into an HTML document, e.g. as follows:

NameAgeAddress
Paul231115 W Franklin
Bessy the Cow5Big Farm Way
Zeke45W Main St

Since TSV is a very simple format, and since the basic structure of HTML
tables is very simple, such a conversion is rather trivial as soon as you
have any programming tool. At the simplest, you could simply add the
<table> tag before the data
and the
</table> tag after it, and
insert
<tr><td> at the start of each data line,
and replace each tab with a <td>.
That way, each line would be turned to a table row and each field to a table cell.
However, this would misrepresent the first line, which is a line of header fields,
at least by TSV definition, so it would be better to use th rather than
td there. There are some additional considerations which make a
somewhat more advanced strategy better. The following Perl code
(also available as a separate file)
tries to produce nice HTML markup and uses very simple heuristics:
it assumes that any field which contains only digits
and spaces
should be right-aligned in a cell.

print <<END;
<table border="1" cellspacing="0" cellpadding="4">
<thead>
<tr><th>
END
chomp ($_ = <STDIN>);
s?\t?</th><th>?g;
print "$_</th></tr>\n";
print "</tr>\n</thead>\n<tbody>\n";
while (<>) {
    print "<tr>\n";
    my @fields = split('\t');
    for $cell(@fields) {
	if($cell =~ /^(\d|\s)+$/) {
	    print "<td align=\"right\">$cell</td>"; }
	else {
	    print "<td>$cell</td>"; } }
    print "</tr>\n"; }
print "</tr>\n</tbody>\n</table>\n";

The code creates just the table markup, so the result can be simply
inserted into an existing HTML document. The example table above was created that way.

You don't need to know
Perl
in order to use that program.
(Of course you'd need some understanding of Perl in order to modify
the program for your needs. But you might also consider editing the result
using an editor, if you just wish to e.g. add an align="left"
attribute into the first <tr> tag, to make the header cells
left-aligned.)
But you need a Perl interpreter, and you
know how to run it. (Typically, you would, after saving the program under the
name tsv2h.pl in a file, and having TSV data in a file tsv.tsv,
go to command prompt level and
just give a command like perltsv2h.pl<tsv.tsv>tsvex.html
to get the converted data into file tsv.html.)
There are
free Perl interpreters available for virtually every platform.
For Win32, many people recommend ActivePerl.

免责声明:文章转载自《Tab Separated Values (TSV): a format for tabular data exchange》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇7th.关于系统时钟的设置Mysql全文索引的使用下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

python--Yaml操作

1、安装:pip install PyYaml -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com 2、python操作Yaml库: 字典--转换--字符串 import yaml python_obj ={ "name":u"张三", "age":0,...

iphone升级到ipad Upgrade Current Target for iPad为灰色的解决方法

Upgrade Current Target for iPad为灰色的解决方法 当前iPad风潮之下,我们会将已经开发完成或者下载到的合适的iPhone源码移植到iPad上,这样可以省去重新开发的时间。但有时项目移植的时候会出现Upgrade Current Target for iPad为灰色不能升级的时候,下面我们来完整的看下移植方法及灰色的处理方法...

使用User-Agent防止HttpClient发送http请求时403 Forbidden和安全拦截

问题的抛出 今天有合作商户反映,批付交易完成后,在我方服务器以“服务器点对点通信”的方式通知请求对方服务器时,对方拦截了请求。并贴了一张截图。 从截图可以看出来,对方拦截了我们的user-agent(Apache-HttpClient)。 什么是User-Agent?通常我们用浏览器访问一个网页,当我们向服务器发送请求时,浏览器会将一些头信息附加上,然后发...

Vue中进行断点调试的两种方式(使用外部浏览器和VsCode Debug for Chrome 插件)

场景 在使用IDEA等进行后台开发时可以直接在想要进行断点调试的地方双击添加一个断点,然后逐步或者逐过程进行调试。 但是在Vue中如果想要进行调试时,如果是在js中调试的话,可以直接添加一个debugger,然后在浏览器中打开检查进行断点调试。 但是如果在vue界面的话则没法逐步查看变量的值。 比如: 在Vue页面中,点击搜索按钮时 <el-butt...

【随手记录】时间格式转换报错:while it seems to fit format 'yyyy-MM-dd'T'HH:mm:ss.SSSZ'

微服务消费者调用提供者获取实体类数据时候,报错: java.lang.IllegalArgumentException: Cannot deserialize value of type `java.util.Date` from String "2020-03-17 20:33:37": not a valid representation (error...

Linux上的free命令详解

解释一下Linux上free命令的输出。 下面是free的运行结果,一共有4行。为了方便说明,我加上了列号。这样可以把free的输出看成一个二维数组FO(Free Output)。例如: FO[2][1] = 24677460 FO[3][2] =10321516 1 2 3 4 5 61 total used free shared buffe...