关于Bowtie

首先来说下以参考基因组建索引：下面是官网上的一个介绍

bowtie-build builds a Bowtie index from a set of DNA sequences. bowtie-build outputs a set of 6 files with suffixes .1.ebwt, .2.ebwt, .3.ebwt, .4.ebwt, .rev.1.ebwt, and .rev.2.ebwt. (If the total length of all the input sequences is greater than about 4 billion, then the index files will end in ebwtl instead of ebwt.) These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence files are no longer used by Bowtie once the index is built.

bowtie-build DNA序列后会生成六个文件，分别是.1.ebwt, .2.ebwt, .3.ebwt, .4.ebwt, .rev.1.ebwt, and .rev.2.ebwt.

而当DNA序列很大的时候，超过4 billion，将会生产以end in ebwtl为后缀的相应六个文件

下面是bowtie-build所用到的算法，以及处理方式：

Use of Karkkainen’s blockwise algorithm allows bowtie-build to trade off between running time and memory usage. bowtie-build has three options governing how it makes this trade: -p/–packed, –bmax/–bmaxdivn, and –dcv. By default, bowtie-build will automatically search for the settings that yield the best running time without exhausting memory. This behavior can be disabled using the -a/–noauto option.

The indexer provides options pertaining to the “shape” of the index, e.g. –offrate governs the fraction of Burrows-Wheeler rows that are “marked” (i.e., the density of the suffix-array sample; see the original FM Index paper for details). All of these options are potentially profitable trade-offs depending on the application. They have been set to defaults that are reasonable for most cases according to our experiments.

The Bowtie index is based on the FM Index of Ferragina and Manzini, which in turn is based on the Burrows-Wheeler transform. The algorithm used to build the index is based on the blockwise algorithm of Karkkainen.

关于Bowtie-build的使用：

Usage:

bowtie-build [options]*

主要的两个参数：

A comma-separated list of FASTA files containing the reference sequences to be aligned to, or, if -c is specified, the sequences themselves. E.g., might be chr1.fa,chr2.fa,chrX.fa,chrY.fa, or, if -c is specified, this might be GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA.

The basename of the index files to write. By default, bowtie-build writes files named NAME.1.ebwt, NAME.2.ebwt, NAME.3.ebwt, NAME.4.ebwt, NAME.rev.1.ebwt, andNAME.rev.2.ebwt, where NAME is .

可选参数：

 
  
   
    
    -f


The reference input files (specified as 
       ) are FASTA files (usually having extension .fa, .mfa, .fna or similar).




-c


The reference sequences are given on the command line. I.e. 
        is a comma-separated list of sequences rather than a list of FASTA files.




-C/--color


Build a colorspace index, to be queried using bowtie -C.




-a/--noauto


Disable the default behavior whereby bowtie-build automatically selects values for the --bmax, --dcv and --packed parameters according to available memory. Instead, user may specify values for those parameters. If memory is exhausted during indexing, an error message will be printed; it is up to the user to try new parameters.




-p/--packed


Use a packed (2-bits-per-nucleotide) representation for DNA strings. This saves memory but makes indexing 2-3 times slower. Default: off. This is configured automatically by default; use -a/--noauto to configure manually.




--bmax 
       


The maximum number of suffixes allowed in a block. Allowing more suffixes per block makes indexing faster, but increases peak memory usage. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default (in terms of the --bmaxdivn parameter) is --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.




--bmaxdivn 
       


The maximum number of suffixes allowed in a block, expressed as a fraction of the length of the reference. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default: --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.




--dcv 
       


Use 
        as the period for the difference-cover sample. A larger period yields less memory overhead, but may make suffix sorting slower, especially if repeats are present. Must be a power of 2 no greater than 4096. Default: 1024. This is configured automatically by default; use -a/--noauto to configure manually.




--nodc


Disable use of the difference-cover sample. Suffix sorting becomes quadratic-time in the worst case (where the worst case is an extremely repetitive reference). Default: off.




-r/--noref


Do not build the NAME.3.ebwt and NAME.4.ebwt portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.




-3/--justref


Build only the NAME.3.ebwt and NAME.4.ebwt portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.




-o/--offrate 
       


To map alignments back to positions on the reference sequences, it's necessary to annotate ("mark") some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^
        rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).




-t/--ftabchars 
       


The ftab is the lookup table used to calculate an initial Burrows-Wheeler range with respect to the first 
        characters of the query. A larger 
        yields a larger lookup table but faster query times. The ftab has size 4^(
       +1) bytes. The default setting is 10 (ftab is 4MB).




--ntoa


Convert Ns in the reference sequence to As before building the index. By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them.




--big --little


Endianness to use when serializing integers to the index file. Default: little-endian (recommended for Intel- and AMD-based architectures).




--seed 
       


Use 
        as the seed for pseudo-random number generator.




-q/--quiet


bowtie-build is verbose by default. With this option bowtie-build will print only error messages.




-h/--help


Print usage information and quit.




--version


Print version information and quit.




                                                        版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容， 请联系我们举报，一经查实，本站将立刻删除。
发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/211250.html原文链接：https://javaforall.net

`-f`	The reference input files (specified as ) are FASTA files (usually having extension `.fa`, `.mfa`, `.fna` or similar).
`-c`	The reference sequences are given on the command line. I.e. is a comma-separated list of sequences rather than a list of FASTA files.
`-C/--color`	Build a colorspace index, to be queried using `bowtie` -C.
`-a/--noauto`	Disable the default behavior whereby `bowtie-build` automatically selects values for the --bmax, --dcv and --packed parameters according to available memory. Instead, user may specify values for those parameters. If memory is exhausted during indexing, an error message will be printed; it is up to the user to try new parameters.
`-p/--packed`	Use a packed (2-bits-per-nucleotide) representation for DNA strings. This saves memory but makes indexing 2-3 times slower. Default: off. This is configured automatically by default; use -a/--noauto to configure manually.
`--bmax`	The maximum number of suffixes allowed in a block. Allowing more suffixes per block makes indexing faster, but increases peak memory usage. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default (in terms of the --bmaxdivn parameter) is --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.
`--bmaxdivn`	The maximum number of suffixes allowed in a block, expressed as a fraction of the length of the reference. Setting this option overrides any previous setting for --bmax, or --bmaxdivn. Default: --bmaxdivn 4. This is configured automatically by default; use -a/--noauto to configure manually.
`--dcv`	Use as the period for the difference-cover sample. A larger period yields less memory overhead, but may make suffix sorting slower, especially if repeats are present. Must be a power of 2 no greater than 4096. Default: 1024. This is configured automatically by default; use -a/--noauto to configure manually.
`--nodc`	Disable use of the difference-cover sample. Suffix sorting becomes quadratic-time in the worst case (where the worst case is an extremely repetitive reference). Default: off.
`-r/--noref`	Do not build the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.
`-3/--justref`	Build only the `NAME.3.ebwt` and `NAME.4.ebwt` portions of the index, which contain a bitpacked version of the reference sequences and are used for paired-end alignment.
`-o/--offrate`	To map alignments back to positions on the reference sequences, it's necessary to annotate ("mark") some or all of the Burrows-Wheeler rows with their corresponding location on the genome. -o/--offrate governs how many rows get marked: the indexer will mark every 2^ rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).
`-t/--ftabchars`	The ftab is the lookup table used to calculate an initial Burrows-Wheeler range with respect to the first characters of the query. A larger yields a larger lookup table but faster query times. The ftab has size 4^(+1) bytes. The default setting is 10 (ftab is 4MB).
`--ntoa`	Convert Ns in the reference sequence to As before building the index. By default, Ns are simply excluded from the index and `bowtie` will not report alignments that overlap them.
`--big --little`	Endianness to use when serializing integers to the index file. Default: little-endian (recommended for Intel- and AMD-based architectures).
`--seed`	Use as the seed for pseudo-random number generator.
`-q/--quiet`	`bowtie-build` is verbose by default. With this option `bowtie-build` will print only error messages.
`-h/--help`	Print usage information and quit.
`--version`	Print version information and quit.

关于Bowtie

关于作者

全栈程序员-站长

发表回复

关于Bowtie

关于作者

全栈程序员-站长

相关推荐

SIM简介「建议收藏」

NetApp存储术语介绍

支付宝小程序上传图片my.uploadFile

JavaScript之闭包，给自己的Js一场重生（系列七）

科大讯飞牵手「挑战杯」：聚焦国产算力，培育AI新质生产力

PyCharm删除项目的方法

发表回复